I Have Lots Of Oracle Database Server Power But Performance Is Slow/Bad
Oracle Database parallelism and serialization is what we as Oracle Database Administrators live and die for. You have a screaming fast Oracle Database system and there is lots of computing power available.
But performance is unacceptable; users are screaming, the phone is ringing, and those fancy dashboards are flashing like it's Christmastime.
What is going on?! What can I do about it?! That's what this post is about.
Learn By Doing
Back in December on the third day of my Oracle Performance Firefighting class, I had each student bring in an AWR report from one of their systems that was giving them problems. (I keep my classes small, giving each student time to do their analysis and time for us to talk about it as a class.)
Get this: Half of the systems had a similar "problem." I think it's important every DBA understands this "problem" because it's more common than most people believe.
There was plenty of computing power and the key SQL statement they cared about was a batch job. What was the core problem? The quick answer is "serialization" that is, a lack of parallelism. Exploring this using a very large production system AWR report and coming up with solutions is what this posting is all about.
Serialization Is Death
In Oracle systems, serialization is death and parallelism is life. Follow this line: business, end user, application designer, DBA, Oracle Database Kernel Architects (or whatever their title is), OS Administrators, OS designers, CPU designers and IO subsystem designers all have something in common. They work hard to parallelize tasks. Just one example: Oracle is designed to have multiple background and foreground processes running parallel.
But all this parallelization effort can be wasted and minimized if a process turns into a serial work stream (at any level; Oracle, OS, business, etc.). The result is "slowness" because the wall time increases.
Available Power And Slowness Equals Opportunity
When I tune Oracle Database systems, I look for opportunities. And each of my solutions will specifically target an opportunity. When I see unused power and complaints of slowness, I look for ways to increase parallelism. Why? Because having available power combined with slowness likely means a serialization limitation exists.
For sure serialization may be necessary. Two examples come to mind; Oracle database memory serialization control (think: latch and mutex) and business rules.
But if I can find a way to increase performance by using up available power by increasing parallelism, I'll likely be able to turn a slow serialization situation into a screaming fast parallelization situation!
How To Recognize A Serially Constrained System
It's easy to recognize a serially constrained system. Ask yourself these two questions. First, is there available CPU or IO power? Second, are there complaints of application "slowness." If the answer to both of these questions is "Yes" then there is likely a serialization issue. Furthermore, the general solution is to use the available resources to our advantage. That is, find areas to increase parallelization, which will use the available resources and improve performance.
If you have the power, use it! What are you saving it for?
(There may be a very good answer to the "saving" but I'll save that for another article.)
Can I Be Out Of CPU And Be Serially Constrained?
Yes. An Oracle Database system can be serially constrained and be out of OS resources. A great example of this is when there is a raging Oracle memory serialization issue. If you see both significant Oracle latching or mutex wait time combined with a raging CPU bottleneck, you likely have a serialization issue... an Oracle Database memory structure access serialization issue.
So, while available power on a "slow" system likely means we have a serially constrained system there are situations in Oracle with a raging CPU bottleneck that also means there is likely a serialization issue.
Find Out: Is There Available CPU Power?
Here Is A Real Life Situation. To simplify, I'm going to focus on only instance number one. Look at instance number one in the below picture.
The above AWR report snippet shows RAC node #1 OS CPU utilization at 15%. This means that over the AWR report snapshot interval, the average CPU utilization was 15%. I never initially trust an AWR report for calculated results. Plus it's good practice to do the math yourself. If you use the super fast busy-idle method I have outlined in THIS POST and detailed in my online seminar, Utilization On Steroids, the utilization calculates to 16% ( 0.5/(0.5+2.7)=0.16 ). So the AWR Report's 15% for CPU "% Busy" looks to be correct.
Clearly with an average CPU utilization of 15%, we have an opportunity to use the unused CPU power to our advantage.
Find Out: Is There Available IO Power?
I am looking for fast IO responsiveness. That is, a low response time. A great way to get a quick view of IO subsystem responsiveness is to look at the average wait time for the event, db file sequential read.
The wait event, db file sequential read is the time it takes to read a single block synchronously. I like to call it a pure IO read call: a) what time is it? b) make the IO call and wait until you get it, c) what time is it? d) calculate the delta and you have the wait time...and the IO read call response time! If you want more details, I wrote about this HERE, which includes a short video.
For our system, let's figure out the single block IO subsystem read response time. Using the same AWR report, here is a screen shot of the Top Time Events.
Again, I'm just going to focus on the first instance. If you look closely (middle right area), you'll see for instance number one, the average db file sequential read time wait time is 2.22ms. That's fast!
There is no way a physical spinning disk is going to return a block in 2.22ms. This means that many of Oracle's single block read calls are be satisfied through some non-Oracle cache. Perhaps an OS cache or an IO subsystem cache. We can't tell, but we do know the block was NOT an Oracle's buffer cache because the db file sequential wait means the block was not found in Oracle's buffer cache.
A single block synchronous IO read call with an average of 2.22ms means there is available IO read capacity and probably available write capacity as well. Again, just like with the OS CPU subsystem, we have unused power that we will try and use to our advantage.
At this point, I will assume there is also plenty of memory and network capacity available. So, the bottom line is we have a "slow" system combined with available CPU and available IO power. Wow! That is a great situation to be in. I call this, "low hanging fruit."
Real Life: Looking For The "Slow" SQL
At the top of this post, I mentioned that in my Firefighting class in each of the "serialization" cases, there was a key SQL statement that was part of a larger batch process. Keep in mind, that at this point in the analysis I did NOT know this. All I knew was that users were complaining and there was plenty of CPU and IO resources.
Usually, in this situation there is a relatively long running process. There could be lots of quick SQL statement involved, but usually this is not the case. And I'm hoping there is a key long running SQL statement that can be parallelized.
Long running can roughly be translated into "high elapsed time." I've have written a number of articles about elapsed time (search my blog for: elapsed time) and even have a free tool with which, you can gather to get more than simply the average elapsed time. And I have online seminars that touch on this subject: Tuning Oracle Using An AWR Report and also, Using Skewed Performance Data To Your Advantage. So there are lots of useful resources on this topic.
In the AWR report, I'm going to look closely at the SQL Statistics, in particular the "SQL ordered by Elapsed Time (Global)." What I really want is the statistics only for instance one, that is, not global. But that's all I have available. Plus the DBAs will/should know if the key SQL statement(s) are run on instance one. Here's the report.
In the report above, look at the elapsed times (second column on the left). Now looking right, find the "Execs", that is, the executions column. The execution column is the number of completed executions within this AWR snapshot range. If the executions is zero, this means the SQL did not complete during the snapshot interval, that is before the ending snapshot.
If you're wondering, these top elapsed time SQL statements are involved in batch processing. When I look at this, I see opportunity, fruit waiting to be harvested!
And I love this: Every DBA in the class in this situation said, "Oh! I know about this SQL. It's always causing problems." Now it's time to do something about it!
Real Life: Putting This All Together
We have identified available CPU and IO capacity. And we have identified THE elapsed time SQL statement. While I'm a pretty laid back kind of guy, at this point I start to apply some pressure. Why? Because the users are complaining, we have identified both an opportunity, the cause of the problem and the general solutions.
There are two general solutions:
1. Do less work. You want to empty a candy dish faster? Then start with less candy in the dish! If you want a SQL statement to run faster, tune the SQL so it touches less blocks.
2. Do the same amount of work, but group the work and run each group at the same time. This is parallelization! This is why the total elapsed time will not decrease (it will probably increase a little) but the wall time will likely decrease... and dramatically! Here is a LINK to posting that contains a short video demonstrating the difference between elapsed time and wall time.
How To Parallelize (in summary)
There are many different ways to parallelize. But the goal is the same: use the available resources to reduce wall time (not necessarily the elapsed time). Perhaps the application can be redesigned to run in parallel streams. But that can take a very long time and be a real hassle. But in many cases, it's the best long term solution.
If you are short on time, are licensed for Oracle Parallel Query and the SQL has been optimized (oh boy... how many times have all heard that before), you likely can use Oracle PQ. And of course, even if the SQL is not optimized, you can still run PQ and performance may be fantastic.
By the way, adding faster IO disks or more IO disks (what is a "disk" is nowadays anyways) will likely NOT work. Remember the IO subsystem is performing wonderfully.
Thanks for reading and enjoy the mystery of your work!
If you have any questions or comments, feel free to email me directly at craig at orapub.com.
|How To Tell If The IO Subsystem Reads Are Struggling||Is Oracle Database 12c Faster Than 11g?||Using ASH Data To Solve A Performance Problem Using A Time Based-Like Approach|