Creating And Understanding A Response Time Chart Based On Oracle Database Time Data
One of the cool things about my last article was that I created a chart displaying Oracle Database time versus the database workload. I went a little further breaking the Database time into a few categories, such as the OS CPU run queue.
In this article, I'm going to take it to the next level. I want to use the sample experimental data (nearly 500 samples) to create a different type of chart. And, this chart is more standard and usually more useful.
It's called the response time chart. Creating and understanding a response time chart built from real, Oracle database time related performance data is what this post is all about.
Response Time Chart Advantages
The response time chart has some advantages over the total time chart that I created in my last article. For example:
- The focus is more granular, such as the time to process a single piece of work, such as a buffer get.
- It can be used to predict SQL statement elapsed times.
- You see standard response time components, such as service time and queue time.
- It lays the foundation for advanced analysis, such as forecasting and capacity planning.
- It's a great communication tool. Everyone just gets it that being in the steep part of the curve is not a good place to be. We can use that to our advantage when recommending solutions.
Get The Experimental Analysis Pack
I'm going to use the exact same raw data that I used in my last article using. You can download the Analysis Pack HERE which contains everything I have related to the experiment including the data collection script, spreadsheets and the raw data. The analysis pack also included the updated spreadsheet that I used only in this article.
Response Time Chart Axis
A response time chart visually shows us the relationship between work and time. So, it make sense that the horizontal axis is the workload intensity (called the arrival rate) and the vertical axis is how long it takes to process a single piece of work (called the response time). The response time is traditionally, and for a good reason, divided into two categories.
The key thing about a response time chart is to always remember, it is showing us the time to process a SINGLE piece of work. In our case, I have defined the piece of work to be an Oracle Database buffer get, that is a logical IO or simply, LIO. I chose an LIO to be the unit of work because this system's workload can be characterized by buffer get activity. For example, when performance slows, LIO intensity increases.
A traditional response time curve has two key categories of time called service time and queue time. If you followed my previous article, you'll recall that service time is the time to process a single piece of work when no queuing is involved.
When teaching DBAs about queuing theory, I like to say that service time like hamburger cook time. Let's say it takes three minutes for me to cook a single hamburger. It makes no difference how many people are queued up to get one of my burgers, it will always take me three minutes to cook a single hamburger. If it takes longer than three minutes for someone to get their hands on one of my tasty burgers, that's because they had to wait in the line that is, the queue. So, the time it takes them to get one of my tasty burgers is the cook time plus their queue time.
Their time to get a burger, that is the cook time plus their queue time, is known as the response time. So, response time is service time (think: cook time) plus queue time. It's actually really simple:
Rt = St + Qt
It's important to remember that the service time (think: cook time) does NOT increase as the workload increases. While the TOTAL cook time for all the burgers increases, the cook time for each burger does NOT increase. If you look at any response time chart, you will always notice the service time is pretty much flat. If the chart was created using math, the service time will be perfectly flat but if the service time is based on actual observations, obviously there will be some deviation.
Oracle Systems And The Response Time Chart
The CPU time to process a single piece of work (think: buffer get) can be used as the service time. As you will quickly see, this line is very flat... just like the service time is supposed to be. The queue time can be database time minus the CPU consumption to process a single piece of work.
By the way, the workload intensity in a system that is purring along can loosely be called the arrival rate. The arrival rate is always shown using the Greek letter Lambda. This is why you will see me using the letter L in this article for the arrival rate.
To demonstrate this, I'm going to use the same sample that I used in the previous article. The sample is load_no 13 and sample_no 18. This data is in row 331 and I've highlighted it into orange on the spreadsheet. Here's is some of the data.
Here are the raw numbers taken from the orange line in the above spreadsheet:
Snapshot Interval (sec) = Column C, Row 331 = 180.025 sec Total CPU Consumption = Total FG CPU consumption (DB CPU) + total BG process CPU consumption (sec) = Column F + G, Row 331 = 910.702 sec + 2.368 sec = 913.070 sec DB Time (sec) = Column H, Row 331 = 6773.7844 sec Total Buffer Gets (lio) = Total Work = Column D, Row 331 = 82140393 lio
Now I'll transform the raw values into a single point that can be plotted on a response time chart. Below are details:
L (lio/ms) = Arrival Rate = Total Buffer Gets / Snapshot Interval = Column M, Row 331 = 82140393 lio / 180.025 sec = 4562721 lio/sec = 456.2721 lio/ms St (ms/lio) = Total CPU Consumption / Total Work = Column N, Row 331 = 913.070 sec / 82140393 lio = 0.011116 sec/lio = 0.0111 ms/lio Rt (ms/lio) = DB Time / Total Work = Column P, Row 331 = 6773.7844 sec / 82140393 lio = 82.466 sec/lio = 0.0825 ms/lio Qt (ms/lio) = Rt - St = Column O, Row 331 = 0.0825 ms/lio - 0.0111 ms/lio = 0.0714 ms/lio
Above is the response time math for a single point that is ready to be plotted. Below is the results of doing the math and plotting the points of our nearly 500 experimental data points. Can you spot our single point?
Response Time Chart Observations
Clearly, this Oracle system is operating in a response time like way. Here is some evidence of this:
- We can see the service time line is nearly flat.
- We can see the queue time increases as the arrival rate increases.
- As characteristic when modeling a CPU subsystem, there is no queue time for a while and then the queue time dramatically begins to increase.
We can also clearly see that this chart is based on a real system and not simply equations. Here is some evident of this:
- Notice the dip in the response time at maximum workload. The math does not support this dip.
- When the overall system is seriously impacted by an intense arrival rate more than the users are impacted. Data collection scripts and monitoring tools... everything is affected.
- Look at the response time when the arrival rate is around 350 lio/ms. A bunch bunch of samples look way out of place. I did not explore why this occurred. But on a real Oracle system it is likely a different mix of transactions could be the cause.
Response Time Chart Warnings & Forecasting
Some of you reading this will want to jump right to using this chart for forecasting purposes. Please be careful. While this chart was created with plenty of sample data (nearly 500 samples), no statistical analysis occurred. This means we do not know the precision, that is, the plus and minus of any resulting forecast. So while it may look useful, it may not be useful and even result in very misleading forecasts.
If you are interested in creating a robust forecast model and using it for solid predictive analysis, here are three resources.
- My class, Oracle Forecasting & Predictive Analysis. I will be teaching this class in a rare public offering this June in Florida. For details, click on the previous link. HERE
- My book, Forecasting Oracle Performance. It's become a classic and you can get it HERE.
- In my two day class, Advanced Oracle Performance Analysis I teach everyone how to create a simple response time model based on an AWR or Statpack report. While the result is not appropriate for classic predictive analysis, it can be used to answer questions like, "If I increase the number of cache buffer chain latches, how will that impact SQL statement X?"
IMPORTANT: Please remember even the best forecast model, is still just a model. It's not real. Even more importantly, while a non-statistically validated model can show general relationships such work and time, it will can result in very misleading forecasts. Please be careful.
What Can I Do With This Information?
That's a really good question. I have just touched the surface on what can be done. In this article, I introduced service time, queue time, response time and used them to create a response time chart. And this was done using script, available in the Analysis Pack, that you can run yourself.
But more important than this, is that you understand there is a relationship between work and time. And, that queuing theory is a way to mathematically relate the two together. And finally, you can take this much further to develop robust forecast models to do some predictive analysis.
What type of predictive analysis? We can answer questions such as:
- When will my server "hit the wall"?
- How will changing X affect the elapsed time of SQL statement Y?
- What will upgrading our CPU subsystem do?
- How will increasing parallelism impact the system?
- How will increasing the buffer cache impact performance?
Questions like these and many others can be responsibly answered using the concepts I've introduced and then taking them to the next level. If you want to do that, then check out the above resources. You can also email with any questions.
Thanks for reading and enjoy your work!
If you have any questions or comments, feel free to email me directly at craig at orapub.com.
|How To Create Histograms Using The Free Statistics Package R||Does Table Column Order Affect SELECT Performance?||How To Solve A Similar SQL Statement Parsing Issue|