Oracle Focus: Log File Switch Checkpoint Incomplete

Posted on 10-May-2016 by Craig Shallahamer, craig@orapub.com

Users are livid and the top Oracle wait event is - log file switch checkpoint incomplete. That's how we found our users in my previous post. So, we started the performance analysis by gathering some baseline performance statistics.

In this article we will focus on the Oracle analysis by doing an Oracle time-based analysis to highlight the Oracle performance improving opportunities. As you will see, even without looking at the operating system or the application, we will make a big impact. How I did this and how massive was the impact? That's the focus of this post!

Oracle: Time-Based Analysis

An Oracle time-based analysis identifies performance improving opportunities by highlighting categories containing lots of Oracle time. Because both our analysis and the user experience is based on time, we can create a quantitative link between our analysis and the user experience. It's not going to be perfect, but there will be a relationship.

This will be extremely brief. For more details, search both my website and blog for "time based" or "otba". For OraPub members make sure to watch my video seminar Tuning Oracle Using An AWR Report.

Essentially, I'm logically categorizing all the Oracle time (cpu and non-idle wait time) and then turning my focus on those categories containing lots of time. If my objective is to reduce time and especially time to process a piece of work like a buffer get, then those "big time" categories will likely present me with significant performance improving opportunities. It is that simple.

Using the same Oracle system from my previous post and again using my OSM tool ttpctx.sql, I took around a 9 minute sample of the Oracle system. Below is the result.

SQL> @ttpctx
Remember: This report must be run twice so both the initial and
final values are available. If no output, press ENTER about 11 times.

DB/Inst: prod30/prod30                                                   12-May 11:05am
Report:   ttpctx.sql                OSM by OraPub, Inc.                  Page         1
                          Total Time Activity (359 sec interval)

                                                          Avg Time        Time     Wait
Time Component                            % TT    % WT Waited (ms)       (sec) Count(k)
-------------------------------------- ------- ------- ----------- ----------- --------
CPU consumption: Oracle SP + BG procs    45.82    0.00       0.000    1164.066        0
log file switch (checkpoint incomplete   30.75   56.75     634.083     781.190        1
db file async I/O submit                  7.09   13.08     323.831     180.050        1
db file scattered read                    4.88    9.00       0.452     123.920      274
control file parallel write               2.73    5.04      20.151      69.400        3
log file switch completion                2.13    3.93     163.505      54.120        0
log file parallel write                   1.95    3.59      29.032      49.470        2
enq: CF - contention                      1.39    2.56     108.889      35.280        0
enq: CI - contention                      1.07    1.97     460.000      27.140        0
db file single write                      0.87    1.61      12.449      22.110        2
asynch descriptor resize                  0.35    0.65       0.001       8.980    #####
log file single write                     0.33    0.61      18.986       8.430        0
PX qref latch                             0.10    0.18       0.001       2.500     2468
db file sequential read                   0.06    0.11       0.007       1.480      221
latch: cache buffers chains               0.06    0.10       2.132       1.420        1
PX Deq: Slave Session Stats               0.05    0.09       0.177       1.270        7
buffer busy waits                         0.05    0.09       3.944       1.270        0
log file sync                             0.04    0.07     125.000       1.000        0
...

Including both the CPU and around 80% of the non-idle wait time, I will categorize the time as follows.

CPU   1164 sec
Wait  1154 sec
  IO       1154 sec
    Read        124 sec
      db file scattered read           124 sec
    Write      1030 sec
      log file switch checkpoint incl  781 sec
      db file async I/O submit         180 sec
      control file parallel write       69 sec
  Other       0 sec

Clearly, we have an opportunity to reduce time by focusing on both CPU consumption and non-idle wait time. Regarding the non-idle wait time, there is a clear opportunity to focus on IO write time and specifically time related to the log file switch checkpoint incomplete wait time.

Next, I will develop some very "Oracle focused" solutions to directly go after these big chunks of time!

Oracle Focused Solutions

If you have followed my work, you will know I always analyze performance and develop solutions using three different perspectives; Oracle, the application (SQL, processes, modules, etc.) and the operating system. In this article, I will focus only on the Oracle focused solutions. In my next post, I will focus on the operating system analysis and solutions. And finally, I'll focus on the application analysis and solutions along with bringing the entire analysis to a close.

Clearly, our two big opportunities are to reduce CPU consumption and the log file switch time. Remember, I am purposely excluding both operating system and application focused solutions.

Solution: Reduce CPU Consumption

Off the top of my head, I can't think of any Oracle adjustments I can make to reduce CPU consumption. I'm also not very concerned because I have a hunch the real problem is something else...

Solution: Increase Time Between Log Switches

In my previous post our baseline recorded the online redo logs switched an average of 36.33 times a minute, which is an average of one switch every 1.65 seconds! I don't think I'll get much pushback when I say, THAT IS CRAZY!

Every log switch forces a checkpoint and every checkpoint forces the DBWRs to write every dirty buffer to disk. Plus there is a fixed amount of overhead for each checkpoint. For efficiency reasons (think: batching), Oracle wants to write when it wants to, so a lot of forcing can cause unnecessary overhead and wait time.

Personally, I like redo logs to switch no more often than once every 15 minutes and 30 minutes is even better. However, because employment is important to me, recovery is more important than performance...be sure you can recover quick enough if you increase the time between log switches.

The simplest Oracle focused way to increase the time between log switches is to increase the size of each redo log. The question now becomes how big should the new redo logs be.

How Big Should The Redo Logs Be?

By increasing the redo log size, the log switch frequency will decrease.

If we know the current switch rate and the current redo log size, one low precision calculation method requires only a simple Algebraic equation. For this example, let's say our goal is one switch once every 3 minutes, not once every 30 minutes as I would do with a real production system.

First, I need to know the current switch rate. Our baseline recorded an average switch rate of 36.33 switches per minute. Second, I need to know the size of the current online redo logs. Here are their sizes:

$ ls -l /*/oradata/prod30/*log
-rw-r----- 1 oracle oinstall  4194816 May  2 11:40 /home/oradata/prod30/redoA1.log
-rw-r----- 1 oracle oinstall  4194816 May  2 11:40 /home/oradata/prod30/redoA2.log

Wow! Someone made what I call a, "career decision." This Oracle system has two 4MB redo logs. No wonder they are switching so fast! Let's calculate the new redo log size.

Here's how I would set up the equation.

180 secs between switches / ? MB redo log = 1.65 sec between switches / 4 MB redo log 

180/x = 1.65/4
x     = 180*(4/1.65)
x     = 436.36 MB

So, for each redo log switch to be around 180 seconds, each redo log needs to be around 437 MB. I'm not quite done yet, because having only two redo logs require 50% of all my redo log space to be checkpointed before the LGWR can switch into the next redo log.

How Many Redo Logs Should There Be?

Now I'm going to focus on the number of online redo logs. Adding more online redo logs decreases the likelihood of receiving a checkpoint incomplete wait.

Basically, I am trying to reduce the amount of checkpoint activity Oracle must complete that is associated with the log the LGWR will attempt to switch into. If the checkpoint associated with the next redo log is not finished when the LGWR attempts to switch into it, the LGWR will have to wait... for the checkpoint to complete.

To help make this clear, here's an example. Suppose the total redo log space is 200MB. With two 100MB redo logs, 50% of all my redo log space must be available before the LGWR can switch into the next redo log. However, with four 25MB redo logs, only 25% of all my redo must be be available before the LGWR can switch into the next redo log.

So, having four redo logs instead of two redo logs, I am requiring Oracle to only complete a checkpoint associated with 25MB instead of 100MB. 25MBs will surely be checkpointed faster than 100MB, thereby reducing the chances of the LGWR having to wait for the checkpoint to complete.

By combining both of the above redo size and number strategies when sizing my redo logs, I should be able to solve just about any log file switch checkpoint incomplete wait situation. This is especially true if I am also able to alter the operating system and application.

Making The Redo Log Changes

To keep this simple, I'm going to create three 440MB redo logs, for a total of 1320MB. The 440MBs will reduce the log switch rate and the third redo log will directly address my goal of reducing the likelihood of a checkpoint incomplete. Here's how I did this:

SQL> ALTER DATABASE ADD LOGFILE '/home/oradata/prod30/redo41.log' size 440m reuse;

Database altered.

SQL> ALTER DATABASE ADD LOGFILE '/home/oradata/prod30/redo42.log' size 440m reuse;

Database altered.

SQL> ALTER DATABASE ADD LOGFILE '/home/oradata/prod30/redo43.log' size 440m reuse;

Database altered.

SQL> !ls -l /*/oradata/prod30/*log
-rw-r----- 1 oracle oinstall 461373952 May 12 11:22 /home/oradata/prod30/redo41.log
-rw-r----- 1 oracle oinstall 461373952 May 12 11:21 /home/oradata/prod30/redo42.log
-rw-r----- 1 oracle oinstall 461373952 May 12 11:22 /home/oradata/prod30/redo43.log
-rw-r----- 1 oracle oinstall   4194816 May 12 11:21 /home/oradata/prod30/redoA1.log
-rw-r----- 1 oracle oinstall   4194816 May 12 11:21 /home/oradata/prod30/redoA2.log

SQL> alter database drop logfile '/home/oradata/prod30/redoA2.log';

Database altered.

SQL> alter database drop logfile '/home/oradata/prod30/redoA1.log';

Database altered.

SQL> select group#, member from v$logfile;

    GROUP# MEMBER
---------- --------------------------------
         2 /home/oradata/prod30/redo41.log
         3 /home/oradata/prod30/redo42.log
         5 /home/oradata/prod30/redo43.log

3 rows selected.

All done? Not really. Aren't you just a little curious if this made a difference? I know I am!

Solution Results

Next, I once again gathered my baseline statistics. Below are the results.

Wow, what happened? Essentially, I opened the redo flood gates (a little) and the redo just started streaming through Oracle.

Notice the log switches per minute. That decreased from over 36 per minute to 0.35 per minute, which is about 2.9 minutes per log switch. I you recall from above, my objective was to size the redo logs for one around every three minutes. Looks like my simple sizing formula worked pretty good!

Notice the blocks changed per second jumped from around 13k/sec to 21k/second. That's around a 63% increase and represents a fantastic throughput improvement. I'm very pleased with this result.

But what about the log switch wait time percentages? Well... the redo was so intense and my IO subsystem so feeble, my Oracle time-based script, ttpctx.sql did not complete within one minute. It's because my script inserts some of the interim results into a table.. I think I need to work on my script. Let's try another approach.

Another way to check if there is lots of checkpoint incomplete time is too look at the alert log file. Here is an excerpt.

...
Thu May 12 12:56:09 2016
Thread 1 cannot allocate new log, sequence 345677
Checkpoint not complete
  Current log# 1 seq# 345676 mem# 0: /home/oradata/prod30/redo41.log
Thu May 12 13:01:21 2016
Thread 1 advanced to log sequence 345677 (LGWR switch)
  Current log# 4 seq# 345677 mem# 0: /home/oradata/prod30/redo42.log
Thu May 12 13:02:13 2016
Thread 1 cannot allocate new log, sequence 345678
Checkpoint not complete
  Current log# 4 seq# 345677 mem# 0: /home/oradata/prod30/redo42.log
Thu May 12 13:05:30 2016
Thread 1 advanced to log sequence 345678 (LGWR switch)
  Current log# 5 seq# 345678 mem# 0: /home/oradata/prod30/redo43.log
...

We can see two things. First, there are still checkpoint not complete's occurring. But the log switches are occurring much less frequently; not in seconds but in minutes. That's a good thing! I suspect by adding additional redo logs of the same size, I could completely eliminate the checkpoint not complete wait time.

Wow, what a difference our changes made. Notice everything changed; the redo log switch rate, the workload rate and the timing profile (not returning counts as a change). So, is performance good enough? Should we stop?

Should We Stop?

A valid question is, with the massive increase in performance should we just stop? To be honest, I would NOT have implemented the solution to add more and larger redo logs until after I completed my analysis. I like to know all my options before I begin implementing a solution.

Remember, all I have done is focused on Oracle. I have not looked at the operating system or the application to see what is generating all the redo. So, should we stop the analysis? The short answer is NO.

In my next post I will continue this analysis by focusing on the operating system. We will look closely at the CPU, memory, IO and if necessary the network subsystem.

Until then, enjoy your work!

Craig.

Start my FREE 18 lesson Machine Learning For Oracle Professionals E-Course here.


Craig Shallahamer is a long time Oracle DBA who specializes in predictive analytics, machine learning and Oracle performance tuning. Craig is a performance researcher and blogger, consultant, author of two books, an enthusiastic conference speaker a passionate teacher and an Oracle ACE Director. More about Craig Shallahamer...


If you have any questions or comments, feel free to email me directly at craig at orapub.com.

How To Use An AWR Report To Create Correct Wait Event Based Histograms Oracle Database Locks And Latches...What A Difference! Tool Options: Detailing Oracle Database Process CPU Consumption