Creating And Understanding A Response Time Chart Based On Oracle Database Time Data

Posted on 11-March-2015 by Craig Shallahamer, craig@orapub.com

One of the cool things about my last article was that I created a chart displaying Oracle Database time versus the database workload. I went a little further breaking the Database time into a few categories, such as the OS CPU run queue.

In this article, I'm going to take it to the next level. I want to use the sample experimental data (nearly 500 samples) to create a different type of chart. And, this chart is more standard and usually more useful.

It's called the response time chart. Creating and understanding a response time chart built from real, Oracle database time related performance data is what this post is all about.

Response Time Chart Advantages

The response time chart has some advantages over the total time chart that I created in my last article. For example:

Get The Experimental Analysis Pack

I'm going to use the exact same raw data that I used in my last article using. You can download the Analysis Pack HERE which contains everything I have related to the experiment including the data collection script, spreadsheets and the raw data. The analysis pack also included the updated spreadsheet that I used only in this article.

Response Time Chart Axis

A response time chart visually shows us the relationship between work and time. So, it make sense that the horizontal axis is the workload intensity (called the arrival rate) and the vertical axis is how long it takes to process a single piece of work (called the response time). The response time is traditionally, and for a good reason, divided into two categories.

The key thing about a response time chart is to always remember, it is showing us the time to process a SINGLE piece of work. In our case, I have defined the piece of work to be an Oracle Database buffer get, that is a logical IO or simply, LIO. I chose an LIO to be the unit of work because this system's workload can be characterized by buffer get activity. For example, when performance slows, LIO intensity increases.

A traditional response time curve has two key categories of time called service time and queue time. If you followed my previous article, you'll recall that service time is the time to process a single piece of work when no queuing is involved.

When teaching DBAs about queuing theory, I like to say that service time like hamburger cook time. Let's say it takes three minutes for me to cook a single hamburger. It makes no difference how many people are queued up to get one of my burgers, it will always take me three minutes to cook a single hamburger. If it takes longer than three minutes for someone to get their hands on one of my tasty burgers, that's because they had to wait in the line that is, the queue. So, the time it takes them to get one of my tasty burgers is the cook time plus their queue time.

Their time to get a burger, that is the cook time plus their queue time, is known as the response time. So, response time is service time (think: cook time) plus queue time. It's actually really simple:

Rt = St + Qt

It's important to remember that the service time (think: cook time) does NOT increase as the workload increases. While the TOTAL cook time for all the burgers increases, the cook time for each burger does NOT increase. If you look at any response time chart, you will always notice the service time is pretty much flat. If the chart was created using math, the service time will be perfectly flat but if the service time is based on actual observations, obviously there will be some deviation.

Oracle Systems And The Response Time Chart

The CPU time to process a single piece of work (think: buffer get) can be used as the service time. As you will quickly see, this line is very flat... just like the service time is supposed to be. The queue time can be database time minus the CPU consumption to process a single piece of work.

By the way, the workload intensity in a system that is purring along can loosely be called the arrival rate. The arrival rate is always shown using the Greek letter Lambda. This is why you will see me using the letter L in this article for the arrival rate.

To demonstrate this, I'm going to use the same sample that I used in the previous article. The sample is load_no 13 and sample_no 18. This data is in row 331 and I've highlighted it into orange on the spreadsheet. Here's is some of the data.

Here are the raw numbers taken from the orange line in the above spreadsheet:

Snapshot Interval (sec) = Column C, Row 331 = 180.025 sec
Total CPU Consumption   = Total FG CPU consumption (DB CPU) + total BG process CPU consumption (sec) 
                        = Column F + G, Row 331
                        = 910.702 sec + 2.368 sec
                        = 913.070 sec 
DB Time (sec)           = Column H, Row 331 = 6773.7844 sec
Total Buffer Gets (lio) = Total Work = Column D, Row 331 = 82140393 lio

Now I'll transform the raw values into a single point that can be plotted on a response time chart. Below are details:

L (lio/ms)    = Arrival Rate = Total Buffer Gets / Snapshot Interval = Column M, Row 331
              = 82140393 lio / 180.025 sec
              = 4562721 lio/sec
              = 456.2721 lio/ms

St (ms/lio)   = Total CPU Consumption / Total Work = Column N, Row 331
              = 913.070 sec / 82140393 lio
              = 0.011116 sec/lio
              = 0.0111 ms/lio

Rt (ms/lio)   = DB Time / Total Work = Column P, Row 331
              = 6773.7844 sec / 82140393 lio
              = 82.466 sec/lio
              = 0.0825 ms/lio

Qt (ms/lio)   = Rt - St = Column O, Row 331
              = 0.0825 ms/lio - 0.0111 ms/lio
              = 0.0714 ms/lio

Above is the response time math for a single point that is ready to be plotted. Below is the results of doing the math and plotting the points of our nearly 500 experimental data points. Can you spot our single point?

Response Time Chart Observations

Clearly, this Oracle system is operating in a response time like way. Here is some evidence of this:

We can also clearly see that this chart is based on a real system and not simply equations. Here is some evident of this:

Response Time Chart Warnings & Forecasting

Some of you reading this will want to jump right to using this chart for forecasting purposes. Please be careful. While this chart was created with plenty of sample data (nearly 500 samples), no statistical analysis occurred. This means we do not know the precision, that is, the plus and minus of any resulting forecast. So while it may look useful, it may not be useful and even result in very misleading forecasts.

If you are interested in creating a robust forecast model and using it for solid predictive analysis, here are three resources.

IMPORTANT: Please remember even the best forecast model, is still just a model. It's not real. Even more importantly, while a non-statistically validated model can show general relationships such work and time, it will can result in very misleading forecasts. Please be careful.

What Can I Do With This Information?

That's a really good question. I have just touched the surface on what can be done. In this article, I introduced service time, queue time, response time and used them to create a response time chart. And this was done using script, available in the Analysis Pack, that you can run yourself.

But more important than this, is that you understand there is a relationship between work and time. And, that queuing theory is a way to mathematically relate the two together. And finally, you can take this much further to develop robust forecast models to do some predictive analysis.

What type of predictive analysis? We can answer questions such as:

Questions like these and many others can be responsibly answered using the concepts I've introduced and then taking them to the next level. If you want to do that, then check out the above resources. You can also email with any questions.

Thanks for reading and enjoy your work!

Craig.

If you have any questions or comments, feel free to email me directly at craig at orapub.com.

CPU Core Vs Thread...CPU Utilization - Part 1 Parsing Performance: Going Beyond Cursor Sharing Using Bind Variables Top 7 Reasons Why Oracle Database Conferences Rock!