Parsing Performance: Going Beyond Cursor Sharing Using Bind Variables

Posted on 23-Jan-2018 by Craig Shallahamer / OraPub / craig@orapub.com

In this fourth parse article, I'll show you why using bind variables is so much better than using cursor sharing. For example, my experiment showed a clear 12X throughput improvement when using bind variables.

You may be saying, "Big deal Craig, everyone knows about bind variables." Perhaps, but does anyone have reproducible experimental data to use when making, for example, a go-bind or no-go-bind decision? And really, how much is the improvement? If you are curious about this, then read on...

The Backstory

Parsing issues can cause significant performance issues for DBAs. I wanted to have a reproducible experimental foundation where I could study the situation. So, it's important to me to be able to demonstrate what can cause parsing problems, what the problems look like, how to diagnose the problems and how to investigate various solutions.

This journey has a few more articles brewing, but here are the ones that I have posted so far:

In this article, I am going beyond cursor sharing by bringing bind variables into the mix. And, I have a ton of awesome experimental data just waiting to explore and learn from. It's going to be fun...

Experimental Design

I created this experiment to be completely hands off. That is, I run the master script and walk away. The main advantage of this, is I can rerun it at any time and I can gather lots of samples!

The experiment was run on a six CPU core Dell server, " Linux sixcore 4.1.12-112.14.2.el6uek.x86_64 #2 SMP Fri Dec 15 18:13:58 PST 2017 x86_64 x86_64 x86_64 GNU/Linux " with " Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit " .

I created the workload by running multiple "angry" scripts. And I can easily recycle the instance, reset cursor sharing and flush the shared pool before each sample cycle.

Below is part of the angry script that is used to create a gazillion similar yet distint SQL statements.

sql_var := 'select c1 from toomuch where c1 ='||dbms_random.value(0,9999999) ;
execute immediate sql_var ;
dbms_lock.sleep(0.003);

The angry SQL below is used when the focus is on using bind variables.

the_value_var := dbms_random.value(0,9999999) ;
execute immediate 'select c1 from toomuch where c1 = :soverycool' using the_value_var ;

For each aspect of the experiment (e.g. cursor sharing set to either force or exact), 31 samples were collected with each sample lasting 30 seconds. After making any changes, like flushing the shared pool, there was a 15 second sleep.

No samples were removed from the experiment. This is unusual, but there was no initial cache warm up necessary.

I placed everything related to this experiment HERE in a single zip file. This includes all the scripts (.sh, .sql), exported results table (op_results), R scripts, a lot of charts (most not included in this article) and all the R spool files which contain most of the data in nicely formatted columns. Yeah... it's awesome!

I reference a couple of my OSM scripts, like when I automatically recycle the database instance. Or, when I kill all the Oracle foreground process using my OP Loader kill program, kp.sh. You can download both of these toolkits on my Oracle performance tools webpage.

If you want to understand the data collection, begin with the angrydriver.sh script. If you want to understand the math, charts and R stuff, start with the parse-main.txt file.

Cursor Sharing [Nearly] Rocks

As we saw in my prior article, setting cursor sharing to FORCE significantly reduced the number of unique yet similar SQL statement. The reason that is so awesome is because it reduces parsing overhead; think CPU consumption, memory management and memory structure serialization control. This reduction in computing resources, enables more work to be processed per second. That is, increased throughput.

If the throughput is not pushed to increase, then responsiveness would likely improve. In other words, users should have a better experience.

The chart below is NOT based on my previous post. The chart below is based on the new experiment, in which all the data in this article is based upon. But as we would expect, while the data values may be different, the story is exactly the same.

What the chart above is screaming to us is this: When there are a gazillion similar yet unique SQL statements, cursor sharing set to the default value of EXACT does not come close to providing the throughput we can achieve when cursor sharing is set to FORCE. Of course, when there is no CPU capacity available, both EXACT and FORCE hit their throughput wall.

Numerically speaking, with cursor sharing set to the default of EXACT, my test system was able to process an average of 8471 executions per second with with an average 8448 hard parses per sec. However, with cursor sharing set to FORCE my system was able to process an average of 52334 executions/sec with an average of only 1 hard parse per second! This means that using cursor sharing set to FORCE resulted in 6 times more throughput! Amazing.

Below is a data used in the above chart and discussed above.

  Angry_Users cpuutil exact_execpersec force_execpersec exact_hparsepersec force_hparsepersec
1           1    0.16          1490.00          8057.80            1464.08               0.43
2           2    0.33          3054.66         16149.67            3033.63               0.42
3           4    0.64          6099.81         32923.90            6079.08               0.39
4           6    0.96          8435.10         48546.63            8412.32               0.41
5           8    0.98          8470.90         50047.78            8448.45               0.39
6          20    1.00          8385.11         52334.42            8362.53               0.55

This is the point where I stopped in my last article. I also did not show you the new performance roadblock. Sure, there was no CPU left, but was going on inside of Oracle? And, what can we do about it? Read on...

Cursor Sharing Thumbprint

Regardless of setting cursor sharing to EXACT or FORCE, eventually a throughput wall is hit... and will always be hit. For sure, the FORCE throughput wall (52K exec/sec) is much further out than with EXACT (8K exec/sec). But, there is still a wall.

I want to show you what it can look like from an Oracle time model and throughput perspective. In other words, what does the CPU and WAIT time profile look like.

On my test system, with cursor sharing set to FORCE, running eight angry processes in the background, executing similar but unique SQL statements (doing this eight times, ./angrysql.sh 1 &) this is what the Oracle time profile looked like. (The below script is in my OSM toolkit.)

SQL> @ttpctx

DB/Inst: prod50/prod50                                                      16-Jan 01:09pm
Report:   ttpctx.sql                OSM by OraPub, Inc.                     Page         1
                           Total Time Activity (73 sec interval)

                                                          Avg Time        Time     Wait
Time Component                            % TT    % WT Waited (ms)       (sec) Count(k)
-------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs    97.29    0.00       0.000     394.750        0
library cache: mutex X                    2.11   78.14       0.650       8.580       13
control file parallel write               0.14    5.19      23.750       0.570        0
db file parallel write                    0.09    3.28      10.000       0.360        0
log file parallel write                   0.08    3.10      10.968       0.340        0
cursor: pin S                             0.06    2.37       1.116       0.260        0
target log write size                     0.06    2.09       8.846       0.230        0
oracle thread bootstrap                   0.05    1.73      17.273       0.190        0
cursor: mutex S                           0.01    0.46       1.136       0.050        0

Notice above 97% of all the time (think database time) is associated with CPU consumption. The remaining 3% is dominated by the Oracle wait event, library cache: mutex X.

The below throughput data was created using my OSM sysstatbl.sql script at the same interval the above time profile was generated. The statistics shows that around 50K SQL statements were executed per second and there was virtually zero hard parsing.

SQL> @sysstatbl
db block changes                             48          .6   80
session logical reads                      2181        26.9   81
physical read total IO requests             269         3.3   81
user commits                                  4           0   81
parse time cpu                             1176        14.5   81
CPU used by this session                  43108       532.2   81
parse count (hard)                            0           0   81
parse count (total)                     4025387     49696.1   81
execute count                           4025396     49696.2   81

While this is great and all, you may be wondering, "Why can't we get even more throughput?" To answer this question, we need to understand a little more what is happening in the library cache. Read on...

What is, library cache: mutex X?

It helps me to understand that cursor related events are related to cursor management. Tasks like memory management, parent cursor and child cursor creation and their maintenance.

However, library cache waits are associated with locating a cursor. So, when I see a library cache event, I think "search" and LOTS of it.

Before a SQL statement can be executed it must be parsed. Part of the parsing is checking if SQL's cursor is in the library cache. If it is, loosely speaking we are talking about a soft parse. If the cursor does not reside in the library cache we are looking at a hard parse situation.

In other words, the Oracle process must look in the library cache for the SQL (based on the SQL hash value). For an Oracle process to look for something in the library cache, the process must first acquire the appropriate library cache mutex, in shared mode. Shared mode is necessary to ensure no other process is changing the cursor while another process is examining or referencing it.

The library cache: mutex X wait event is telling us the foreground process is not able to complete the look up process because another process has exclusive control of the mutex.

Why would another process have the mutex we need in exclusive mode? Deep within mutex processing, to acquire a mutex in shared mode (correct, shared), for a split second the mutex must be held exclusively. Once held in exclusive mode, some variables are updated and exclusivity is released but shared mode is still held.

This "split second" is so fast, we usually never notice it. In fact, most people don't know that to acquire a mutex in shared mode, you must hold it briefly in exclusive mode. But in this case, the cursor activity is very concentrated on a single library cache mutex. It is so concentrated that even this split second of exclusivity is adding up to be a performance roadblock!

Is there a way to move beyond this roadblock and increase throughput? Read on...

Bind Variables [Really] Rock

My next experimental run was using bind variables instead of cursor sharing. But actually, cursor sharing was set to the default of EXACT. (Truthfully, I did do an experimental run using bind variables and cursor sharing set to FORCE, but I will write about this in my next article.)

Below are the results comparing throughput when using cursor sharing EXACT, cursor sharing FORCE and when bind variables are being used along with cursor sharing set to the default EXACT.

What the chart above is screaming to us is this: While cursor sharing and in particular cursor sharing set to FORCE is awesome, using bind variables is SUPER awesome! While most of us have been told this and our experiences agree, look at the difference!

We start with cursor sharing EXACT achieving a maximum throughput of 8K execs/sec, then changing to FORCE resulting in a 6X increase of throughput to 52K exec/sec but, when using bind variables throughputs jump up an additional 1.8X to a maximum throughput of 95K exec/sec. This means with a starting throughput of 8K exec/sec and ending at 95K exec/sec, throughput increased nearly 12 times. That is crazy cool!

So, next time someone asks, "How much better is hard parsing?" you can say, "OraPub's experiment showed a 12X improvement from cursor sharing set to EXACT to using bind variables."

While all this is very cool, don't forget that if the application had used bind variables in the first place, we would have immediately receive this benefit! Said another way, my experiment showed NOT using bind variables can result in a 12X DECREASE in throughput. To make up that throughput loss, you'll need a additional CPU power. (Wow... cloud providers must HATE bind variables!)

Below is sample data used in the above chart. Remember, each data point is a 31 sample average.

  Angry_Users cpuutil bv_exact_execpersec bv_exact_hparsepersec cs_exact_execpersec cs_force_execpersec
1           1    0.14            15023.81                  0.15             1490.00             8057.80
2           2    0.28            30206.76                  0.16             3054.66            16149.67
3           4    0.55            62046.55                  0.28             6099.81            32923.90
4           6    0.83            91094.43                  0.40             8435.10            48546.63
5           8    0.90            94842.43                  0.41             8470.90            50047.78
6          20    1.00            93101.21                  0.41             8385.11            52334.42

Bind Variable Database Timing Thumbprint

I want to show you what the Oracle time model and throughput can look like when bind variables are being used. In other words, what does the CPU and WAIT time profile look like.

This time, on my test system cursor sharing was set to the default EXACT while eight angry processes running in the background where using bind variables. When the angry script is running in mode 2 (e.g., ./angrysql.sh 2 &), the bind variable SQL is run. Here are a couple relavent lines from the angrysh.sql script.

the_value_var := dbms_random.value(0,9999999) ;
execute immediate 'select c1 from toomuch where c1 = :soverycool' using the_value_var ;

This is what the time profile looked like this:

SQL> @ttpctx

DB/Inst: prod50/prod50                                                      16-Jan 01:32pm
Report:   ttpctx.sql                OSM by OraPub, Inc.                     Page         1
                          Total Time Activity (136 sec interval)

                                                          Avg Time        Time     Wait
Time Component                            % TT    % WT Waited (ms)       (sec) Count(k)
-------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs    99.56    0.00       0.000     736.706        0
cursor: pin S                             0.16   36.53       1.131       1.180        1
control file parallel write               0.15   34.06      21.569       1.100        0
db file parallel write                    0.05   12.38       8.333       0.400        0
db file single write                      0.02    5.57       8.571       0.180        0
log file parallel write                   0.01    2.17      14.000       0.070        0
...

Notice that 100% (rounded) of the database time is CPU consumption! The event cursor: pin S occurs when an Oracle processes is trying to execute a cursor, but before it can do this it must pin the cursor in shared mode to ensure another process does not change it while it is being executed.

When cursor pin S is the top or near top event, it tells me there is at least one statement that is crazy popular.

This is important to understand: But wasn't our goal to reduce the number of unique SQL statements? Yes. That's what cursor sharing does for us. But we took it even further by using bind variables. So, we are making tremendous throughput progress and we are getting we asked for, fewer unique SQL statements.

Just for general comparison, the execution count shown below is around 96K executions per second. Without bind variables but cursor sharing set to FORCE, my OSM sysstatbl.sql script reported around 50K execs/sec. So, even with this simple one sample comparison, we see a dramatic increase in executions per second.

SQL> @sysstatbl
db block changes                              0           0   55
session logical reads                      5133        93.3   55
physical read total IO requests             257         4.7   55
user commits                                  2           0   55
parse time cpu                                2           0   55
CPU used by this session                  29984       545.2   55
parse count (hard)                           23          .4   55
parse count (total)                         268         4.9   55
execute count                           5274648     95902.7   55

We Got What We Wanted

In my experiment, using bind variables resulted in a 12X execution per second throughput increase compared to setting cursor sharing to the default of EXACT and subjecting the system to gazillion similar yet unique SQL statements. We got what we wanted... increased throughput. Yet still in the end we hit another performance wall!

Is there a way around this new wall to achieve even a higher throughput? The short answer is an amazing YES! I'll present this in another article.

But before we push through the next performance wall, there are some very cool insights we can glean from this experiment that probably have gone unnoticed. Those insights will be topic of my next article.

All the best in your Oracle tuning work,

Craig.

If you have any questions or comments, feel free to email me directly at craig at orapub.com.

Is Oracle Database Time Correct? Something Is Not Quite Right Posted Remaining 2010 Training Schedule 11 Ways To Get Your Oracle Database Conference Abstract Accepted