Operating System Focus: Log File Switch Checkpoint Incomplete

Posted on 31-May-2016 by Craig Shallahamer, craig@orapub.com

This post is about how to find the operating system bottleneck, if there is one.

The situation has outraged users with the top Oracle wait event being log file switch checkpoint incomplete.

In my previous two posts, here and here, we based-lined the system, completed an Oracle time-based analysis, implemented one of the solutions and measured the improvement... which was significant!

But our analysis is far from complete.

In this article I will focus on the operating system. I will demonstrate how to find the OS bottleneck: either CPU, IO or memory, and if necessary, the network subsystem. The result could be an OS focused solution.

How I analyzed the operating system and how you can do the same on your system is the focus of this article.

Why Look At the Operating System?

Have you ever watched a child ride a tricycle with one of the wheels not working properly? Remember seeing them frustrated, angry and working incredibly hard just to go a short distance? Sounds like many Oracle systems, eh?

So far in our analysis, we have focused on one wheel! It's been the Oracle "wheel." We did this by doing an Oracle time-based analysis. The other two "wheels" are the operating system and the application (sql, processes, users, etc.).

If want to optimally working system or a optimally working tricycle, we need to ensure all three "wheels" are working well. This is why I am going to look at the operating system.

When I look at the OS, I'm trying to find a bottleneck or the absence of one. Think of a bottleneck is performance friction or backward pressure. We need to identify if there is performance friction and reduce it. The result will increase the flow, that is throughput, of Oracle work.

For Oracle systems, I look at the CPU, IO, memory and if necessary the network subsystems. So... here we go!

CPU Subsystem Busyness

Discovering if the CPU subsystem is struggling is very simple. We only have to look at two statistics; average utilization and run queue. And really, the average utilization is probably good enough if you are just getting started.

CPUs service transactions. This is why capacity planning people refer to the CPUs as "servers." When a CPU is NOT servicing a transaction, it is NOT busy. When a CPU core is servicing a transaction, it is busy. It's a binary kind of thing.

So, we need to understand the "busyness" of a CPU over an interval of time. Fundamentally, we divide all the busy time (30s) by the interval time (60s). The result is the average busyness over the interval (50%). We call this the average CPU utilization.

Another approach is to take frequent CPU samples, categorizing them as active or not active. Then we take all the active samples (250) and divide them by the sum of the active (250) and non active samples (50). The result will be the average busyness (83%=0.833=250/(250+50), that is the average utilization.

CPU Subsystem Bottleneck Analysis

The simplest and most widely available non-Windows CPU utilization tool is vmstat. Let's take a look at the CPU subsystem before I added the additional bigger online redo logs.

I'm going to have vmstat take 9999 (forever) 10 seconds samples. Below is the output.

$ vmstat 10 9999

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 7  0  34584 364236 444876 22362612    0    0     0  5282 5865 6050 42 13 44  1  0	
 1  0  34584 364048 444876 22362624    0    0     0  4564 5832 5972 42 12 44  1  0	
 1  0  34584 364236 444876 22362640    0    0     0  5476 5821 6123 41 12 46  1  0	
 2  0  34584 351596 444876 22362656    0    0     0  4842 5925 6044 43 13 43  1  0	
 3  0  34584 351288 444876 22362668    0    0     0  5445 6113 6089 45 14 40  1  0	
 3  0  34584 343468 444876 22362692    0    0     0  5018 5538 6012 38 11 49  2  0	
 2  0  34584 349252 444876 22362716    0    0     1  4806 5712 6108 40 12 47  1  0	
 2  0  34584 349136 444876 22362736    0    0     1  4933 5987 6099 43 13 43  1  0	
 2  0  34584 363520 444876 22362752    0    0     0  4529 5694 6002 40 12 46  1  0	
 2  0  34584 356048 444876 22362776    0    0     1  4462 5656 6039 39 12 48  1  0	
 2  0  34584 356268 444876 22362788    0    0     0  4145 5889 6028 43 13 44  1  0	
 1  0  34584 356144 444876 22362804    0    0     0  4815 5801 6047 41 12 46  1  0	
 7  0  34584 363476 444876 22362816    0    0     0  4727 5779 6020 41 12 46  1  0	
 8  1  34584 366764 444876 22362832    0    0     0  4449 5562 5958 39 11 49  1  0	
 5  0  34584 367216 444876 22362848    0    0     0  5276 5901 6082 42 13 43  2  0	
 7  0  34584 360440 444876 22362860    0    0     0  4762 5826 6034 42 12 45  1  0	
 2  0  34584 360680 444876 22362876    0    0     0  4810 5835 6110 41 12 45  1  0	
 3  0  34584 360284 444876 22362896    0    0     0  4703 5841 6042 42 13 45  1  0	
 3  1  34584 360756 444876 22362908    0    0     0  5310 5690 6064 40 12 47  1  0
...

CPU are either busy or idle. When you add the busy and idle, the result will be 100 or 100%. Looking at the above vmstat output, notice CPU "us" and "sy" columns. Those are the user time and the system time respectively. User time is focused on programs like sqlplus and the system time focused on operations like virtual memory management, process scheduling, etc. Sum the user and system time together and the result will be the busy time. So the utilization or the busyness is user time + system time.

For example, look at the first line. The user time is 42 and the system time is 13. This means the average CPU utilization over the 10 second interval is 55 or 55%. Now look at the last line. The user time is 40 and the system time is 12. This means the average CPU utilization over the 10 second interval is 52 or 52%.

The only strange column is the "st" column. That is the time stolen from the CPU subsystem and given to another virtual machine. Since my server is a stand alone box, the "st" column should always be zero.

Generally speaking, if your average CPU utilization is less than 80% you don't have a CPU bottleneck. However, if you run a one hour AWR report and calculate the average CPU utilization, remember that while the average utilization was 80% there will be times when the CPU subsystem is greater 80% busy... perhaps much greater!

In our situation, the average CPU utilization is around 55%, so we are good. That is, there is not a CPU bottleneck. That probably does not come as a surprise, but it is always a good idea to check.

Next, we'll take a look at the memory subsystem.

Memory Subsystem Bottleneck Analysis

Determining if there is a memory bottleneck is super simple... on Linux... not so on AIX. On linux, look for memory pages that have been swapped out to disk. If you repeatedly see some, the OS virtual memory manager is struggling to keep popular pages in memory. This is very similar to Oracle trying to keep the popular Oracle blocks in the buffer cache.

Find the vmstat "so" column from the previous above section. Notice every single "so" column has a zero value. This means there not a memory issue. It's that simple: zero is good and non-zero is bad.

Next, we'll take a look at the IO subsystem. Any ideas what we'll find?

IO Subsystem Bottleneck Analysis

Analyzing the IO subsystem on a modern day non-small production Oracle is not easy. In fact, it can be impossible for the Oracle DBA. The problem is lots of layers of abstraction and software and the real gotcha, sharing of physical disks by multiple systems.

To keep this brief, I'll demonstrate two ways to tackle this beast. The first is using the classic OS command "iostat" and the second will be using Oracle wait time. Each has significant benefits and drawbacks. I'm going to start with iostat.

IO Subsystem Bottleneck Analysis: iostat

If you have OS prompt assess on a non-Windows server, you probably have access to iostat. Iostat is a very flexible and robust tool that every Oracle DBA needs to become familiar with. The drawback with using iostat on Oracle systems is understand where the Oracle database files reside from an iostat perspective. Sometimes as a DBA you will never know the answer to this. And not surprising and extremely frustrating (not to mention dangerous), many IO administrators don't know either!

There are many command line iostat options. And the options vary based on your flavor of U/Linux. Let's take a look at the IO subsystem before I added the additional bigger online redo logs.

$ iostat -xt 10 999
...
05/02/2016 10:22:56 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          40.58    0.00   12.63    1.49    0.00   45.30

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.20    0.00    1.60     0.00    14.40     9.00     0.02   10.06   7.00   1.12
sda               0.00     5.70    0.20  145.20     3.20  9693.60    66.69     1.03    7.04   5.60  81.49
dm-0              0.00     0.00    0.00    0.30     0.00     2.40     8.00     0.01   15.33  25.67   0.77
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.30  150.40     4.80  9691.20    64.34     1.06    7.04   5.41  81.46

05/02/2016 10:23:06 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          42.72    0.00   13.14    1.25    0.00   42.89

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.10    0.00    1.40     0.00    12.00     8.57     0.01    5.57   5.57   0.78
sda               0.00     5.00    1.60  131.50   139.20  9487.20    72.32     0.96    7.00   5.66  75.30
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.25
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    1.50  136.40   137.60  9493.60    69.84     0.98    6.90   5.46  75.31

05/02/2016 10:23:16 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          41.71    0.00   12.84    0.87    0.00   44.58

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.20    0.00    1.60     0.00    14.40     9.00     0.02    9.50   9.50   1.52
sda               0.00     5.90    0.00  142.40     0.00 10852.80    76.21     0.92    6.74   5.31  75.61
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00  147.90     0.00 10846.40    73.34     0.97    6.81   5.11  75.62

05/02/2016 10:23:26 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          45.42    0.00   13.75    0.92    0.00   39.91

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.10    0.00    1.70     0.00    14.40     8.47     0.01    7.18   7.18   1.22
sda               0.00     5.40    0.10  148.20     3.20  9484.00    63.97     0.88    5.94   5.03  74.53
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.10  153.40     3.20  9484.00    61.81     0.91    5.92   4.85  74.49

05/02/2016 10:23:36 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          39.03    0.00   11.57    1.20    0.00   48.19

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.20    0.00    1.40     0.00    12.80     9.14     0.01    5.93   5.93   0.83
sda               0.00     4.80    0.00  126.10     0.00  8895.20    70.54     1.09    8.69   6.17  77.79
dm-0              0.00     0.00    0.00    0.10     0.00     0.80     8.00     0.01  133.00 133.00   1.33
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00  130.60     0.00  8894.40    68.10     1.10    8.46   5.96  77.78
...

There is a lot to look at. For our purposes today, just look at the far left and right columns plus the r/s and w/s columns.

The far left column is the device column. Usually you can change the perspective of the device column via iostat command line options. The challenge for us is defining just exactly what is a "device." On my system it's simple and I know the IO subsystem; three SCSI devices (sda, sdb, sdc). Plus all my Oracle database files for the prod30 instance reside of sda. But your system is not going to be so simple.

If you need help understanding this, go talk with your IO administrator. They should be able to help. Unfortunately, they may not. And in my view, that is a big problem in IT today... don't get me started.

The far right column is obviously the average device utilization. Modern day IO subsystems can run above 90% utilization without a problem. So, that column is not so useful unless it's pegged at above 90%. On my system, the sda device is running around 75% utilized. That's fine and not a cause for alarm.

Now look at the "await" and the "svctm" columns. These are IO request wait and service times in milliseconds. However, what they actually mean requires a manual page check. According to this iostat manual page, "await" includes both the wait time and the service time. From a pure queuing theory perspective this is actually called response time, the time to process a single piece of work, which in this case is an IO request. So, the column naming is screwed up. But that's the way it is reported.

So, looking at the above iostat output, the sda device's average response time (i.e., "await" column) is anywhere from around 7ms to 15ms. We want to see time less than 10ms and really we want to see them less than 6ms. But, this is highly dependent on your IO subsystem. On my SCSI IO subsystem samples over 10ms is cause for concern.

Looking at the output above, I would be concerned that the IO subsystem is struggling to handle the workload it is being given to process. Why? Because it's a very simple SCSI IO device running at nearly 80% and, with just a few 10 second samples, I recorded that sometimes it takes over 15ms to service a request. Have I seen worse IO subsystems? For sure, but this one is teetering over the edge.

So, do I go running to the IO Administrator about their less than quick responding IO subsystem. Please NO!

Do Not Run To The IO Administrator

In the above section I stated the IO subsystem is teetering on the edge of extremely poor response time. Yet I also wrote, that this is not the time to approach the IO Administrator. While this may seem like a conflict, here is why it is not:

Oracle is hammering the IO subsystem with redo. If you look at the read requests per second (r/s) versus the write requests per second (w/s), we can see the load is nearly all IO write activity. And we know where that's coming from; the database writer and the log writer.

Do you want really want to go to the IO Admin when it's your database that is beating the crud out of "their" IO subsystem? Perhaps we will get to that point, but we need a fuller understanding of the situation before we do that.

The SCSI device is probably operating to specification. So, to say there is something wrong with the IO subsystem is technically incorrect. "Your" Oracle system is just hammer the device. So, who's fault is it?

We still have not looked at the application SQL yet and there was clearly a redo log configuration issue. So, perhaps once we address both Oracle and the application, there will be no need to approach the IO team. In other words, let's finish our analysis so we have fuller knowledge before we approach the IO team.

IO Subsystem Bottleneck Analysis: Wait Interface

As crazy as it sounds, you can use Oracle's wait interface to measure IO response time. I have blogged about this many times, so I don't want to do a re-write here. Just search my blog for "event histogram" and also for "io read time".

For reference, below is the Oracle time report taken directly from my previous post.

SQL> @ttpctx
Remember: This report must be run twice so both the initial and
final values are available. If no output, press ENTER about 11 times.

DB/Inst: prod30/prod30                                                   12-May 11:05am
Report:   ttpctx.sql                OSM by OraPub, Inc.                  Page         1
                          Total Time Activity (359 sec interval)

                                                          Avg Time        Time     Wait
Time Component                            % TT    % WT Waited (ms)       (sec) Count(k)
-------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs    45.82    0.00       0.000    1164.066        0
log file switch (checkpoint incomplete   30.75   56.75     634.083     781.190        1
db file async I/O submit                  7.09   13.08     323.831     180.050        1
db file scattered read                    4.88    9.00       0.452     123.920      274
control file parallel write               2.73    5.04      20.151      69.400        3
log file switch completion                2.13    3.93     163.505      54.120        0
log file parallel write                   1.95    3.59      29.032      49.470        2
enq: CF - contention                      1.39    2.56     108.889      35.280        0
enq: CI - contention                      1.07    1.97     460.000      27.140        0
db file single write                      0.87    1.61      12.449      22.110        2
asynch descriptor resize                  0.35    0.65       0.001       8.980    #####
log file single write                     0.33    0.61      18.986       8.430        0
PX qref latch                             0.10    0.18       0.001       2.500     2468
db file sequential read                   0.06    0.11       0.007       1.480      221
latch: cache buffers chains               0.06    0.10       2.132       1.420        1
PX Deq: Slave Session Stats               0.05    0.09       0.177       1.270        7
buffer busy waits                         0.05    0.09       3.944       1.270        0
log file sync                             0.04    0.07     125.000       1.000        0
...

If you want to explore this on your own, here are some clues: average db file sequential read wait time is 1.48ms (very good), the average log file sync wait time is 125ms (wow..really bad) and the average db file single write wait time is 12.4ms (not good).

Given the Oracle redo configuration issue and that we have not analyzed the application yet, I would hold off on approaching the IO team. What about the network subsystem?

Network Subsystem Bottleneck Analysis

Honestly, I don't want to extend this post any further by delving into the network subsystem. Plus, I don't blog much about networking because it is very difficult to analyze without an understanding of your network architecture. While commands like tnsping, ping and netstat are very useful, where to apply them and how to properly use them and interpret the results is outside of the scope of this post.

But I will tell you this. In this particular situation, the network is not a problem. There is plenty of excess network capacity.

Bottleneck Analysis: The Result

It's time to bring our OS analysis to a close, so we can move on to the third and final circle; application.

Based on the above OS analysis, we know that the CPU subsystem has plenty of capacity, the memory and networking situation is fine and the IO subsystem is teetering on disaster.

So, at this point, don't do anything...except use what we have learned from our Oracle and our OS analysis to lead us to what to look for within the application. How to do this and how to find application focused solutions will be the focus of my next article.

Until then, enjoy your work!

Craig.

If you have any questions or comments, feel free to email me directly at craig at orapub.com.

Simple Way to Calculate OS CPU Utilization From Oracle Database V$ Views Top Five Holiday Gifts For The Oracle DBA More On Oracle Database User Calls