How To Identify Oracle Sorting Issues
Last month I did a webinar for my members about identifying and solving SQL Sorting issues. I received such a positive response, I wanted to share some of it with you.
Here is the central question.
How do you know if sorting is contributing to a performance issue? And if so, what is the cause and finally, what can we do about it? Realistically, that's too much for a single post, so in this post I am focusing on how to identify if there is a real problem. So, here we go...
How Oracle Sorts
Let's begin with how Oracle sorts. Sorting has been around forever in computing. Everyone who has taken a data structures course knows there are many ways to sort. And, they also know that once the data does not all fit into memory, it's as they say, a whole new ballgame. It's no different with Oracle.
As the sorting-focused-super-simplified Oracle architecture figure below shows, the key players are a client process, foreground process, PGA memory, temporary segments and temporary tablespace database files.
Here is how it works. When a user process (think: sqlplus) instructs its foreground process (think: oracle) to run a SQL statement that must sort (think: order by), the foreground process will attempt to do the entire sort in memory. More specifically, within the foreground process memory, known as the PGA memory. That's the good news.
The bad news is, if the sort is too big to be completed in the allocated PGA memory, the foreground process will divide the sort into smaller pieces, commonly called, "sort runs." When a sort run is not immediately needed, the foreground process can write it out to disk in a temporary segment. If needed, the foreground process can read it back into its PGA memory.
While this divide and conquer approach works, as you can image with the additional work including IO, it can take a while. So, while sorting is part of normal data processing, at some point it can become a significant amount of database time. The trick is to correctly diagnose the situation. Read on...
Your Diagnostic Clue
An Oracle process consumes CPU when sorting. And, if sort runs are required, the foreground process may need to move a sort run to (write) or from (read) a temporary segment residing on disk. What hammers performance is when Oracle processes spend a lot of time doing IO related to sorting.
Recognizing the total temporary segment IO wait time is our key indicator!
Keep in mind that an active Oracle foreground process can only consume CPU or be waiting for something, so it can once again continue consuming CPU. Oracle wait events give us clues about why an Oracle process is unable to consume CPU.
Below is a snippet from an AWR report, showing an instance level perspective of the CPU and wait time. In case you are wondering, I did NOT change the numbers! This is a real life production Oracle system... that's in deep poopoo.
Here is your key diagnostic clue. If the foreground process is writing or reading sort data to/from temporary segments and prevents the foreground process from consuming CPU, there will be an associated Oracle wait event.
When the foreground process is prevented from consuming CPU while writing temporary sort data to disk the event is, direct path write temp. And, when reading temporary sort data from disk the event is, direct path read temp. Our clue is the word "temp". It is this simple!
Look For The Big Pile Of Time
In the AWR report snippet above, the sort related time is relatively massive. This is a good thing... from a diagnostic perspective. Seeing a big pile of wait time associated with either of these events tells you there is a lot of sorting occurring and it is truly impacting performance.
Whenever you see a "big pile" of Oracle time, Oracle is presenting you with a performance improving opportunity. While the users experience this as a bad thing, for the performance analyst this is a good thing.
If my "big pile of time" analogy seems strange to you, then you need to learn how to do an Oracle time-based analysis. If you want to see how this is done, look at THIS post where I diagnose a log file switch issue using an Oracle time based approach.
Core Solution Options
As with most things Oracle tuning, you have choices. However, your solutions will likely be centered around decreasing SQL sort activity, increasing PGA size and increasing IO performance related to temporary tablespace database files.
Tuning High Sort SQL
Tuning the high-sort SQL to reduce sort requirements could be as simply as creating an index that your foreground process will use to sort. Or perhaps, encouraging Oracle (think: hint) to use a different type of sort.
In my next post I will show you how to find the sort intensive SQL using ASH and AWR and other performance views.
Increase PGA Memory
If there is operating system memory available, increasing the PGA size can increase the sorting in memory, thereby reducing the sorting to disk. Take a look at the pga_aggregate_limit instance parameter and talk with your OS admin about increasing Oracle's memory requirements.
Increase IO Performance
Finally, check if the IO subsystem is struggling to respond. While this topic is out of scope for this post, here are links to a couple of my IO performance related posts.
Obviously we are not done yet but I wanted to keep this post short and focused. In the Core Solution Options section above increasing the PGA memory and IO performance will affect the entire database system. But the "tuning the high sort SQL" is very specific and requires you to identify the "high sort" SQL. There are many ways to do this... and that will be the focus of my next article.
All the best in your Oracle tuning work!
If you have any questions or comments, feel free to email me directly at craig at orapub.com.
|11 Ways To Get Your Oracle Database Conference Abstract Accepted||Why Tuning Oracle Database Works And Modeling It||Inferring SQL Run Time Using ASH And PQ Program Filtering|