Watching An Oracle Buffer Over Time
Have you ever wanted to watch an Oracle buffer? I mean, watch it being brought into the cache, watch its touch count change, watch it being replaced with another buffer only later to be brought back into the cache? I have.
So, I created a little PL/SQL script to record the touch count of a semi-popular Oracle buffer caught in a live-or-die pursuit to remain in the buffer cache. (Which is where every Oracle block wants to be!)
What I discovered is quite entertaining, I think, and confirms many of the inter-workings of Oracle's Least Recently Used (LRU) touch-count algorithm.
If this is the kind of thing that interests you, then you're in for a real treat!
My Quest To Explore
The annotated version of the above image is near the bottom of this post.
Above is a time-line view of a single Oracle buffer as its touch count is changed, replaced and brought back into the buffer cache. Each point represents its touch count at the time it was sampled. The sampling rate was every 0.50 seconds.
I want to explore the movement because it helps us understand key elements of Oracle's LRU algorithm. Plus, with just a little more information you will see how the wait events free buffer waits, read by other session and write complete waits can manifest.
That's a lot of content to blog about... so I'm not going to do that. My focus today is so simply write about the observed touch count changes related to the observed Oracle block in the buffer cache. So, let's get started!
The Back Story Details
This section includes the relevant Oracle internal details summarized. If the next paragraph doesn't make sense, then please read the below sub-sections. Otherwise, skip this section.
The key thing to remember is each Oracle cached block is referred to as a buffer and each Oracle buffer has an associated buffer header and the buffer's touch count is physically stored in its buffer header. And the buffer headers are stored in the shared pool.
Key Buffer Cache Lists
There are three key Oracle buffer cache lists that as performance focused DBAs, we need to know about. They are the cache buffer chains (CBC), the least recently used (LRU) lists and the write/dirty lists. The focus of this post is the LRUs.
Here's a link to very cool buffer cache visualization tool I created.
The LRUs are primarily used by Oracle server/foreground processes in the hunt for a buffer to replace. Why the need for replacement? Probably either a block is about to read from disk (physical IO read) or a buffer needs to be cloned for read consistent purposes. Both of these operations requires a buffer to be replaced. But which buffer should be replaced? Read on!
Buffer Headers Not Buffers
The buffer cache lists are made up of buffer headers, not the actual buffers. They is where the buffer meta-data view v$bh gets it's name; Buffer Header.
For each cached buffer there exists one and only one buffer header. The buffer headers are physically stored in the shared pool, while the physical buffers (cached blocks) are stored in the buffer cache.
The buffer headers contain information about the buffers, such as where they are located on disk, where they are located in the buffer cache, the last time the block was read from disk, the previous and next LRU pointers, the previous and next cache buffer chain pointers and more.
While v$bh provides some meta-data, if you really want to see the previous and pointer details as well as the buffer's touch count (more later about that), you need to look at x$bh.
So, whenever you picture in your mind or draw a picture of one of the buffer cache lists, remember the list nodes are constructed from the buffer headers not the actual buffers (cached blocks).
Free, Dirty and Pinned
A buffer can be either free or dirty. And, it can also be pinned or not pinned. It's actually really simple.
Free Buffer Details
If a buffer is free, what is in memory is what is on disk. It's like the buffer and the block mirror each other. A free buffer can be replaced without loosing application data. Why? Because there is an exact copy on disk. A free buffer can be popular or not popular.
So, if I am looking for a buffer to replace, a non-popular free buffer is a perfect candidate!
Dirty Buffer Details
If a buffer is dirty, then what is in memory does NOT match what is on disk. All it takes is a changed buffer byte and they are different, and therefore the buffer is dirty. Dirty buffers can not be replaced because they have changed from their original image that resides on disk. (Technically they could be changed, but then recovery would be necessary.)
So, if I am looking for a buffer to replace, a dirty buffer is not an option, regardless of its popularity.
Pinned Buffer Details
A buffer header is pinned when an Oracle process wants to ensure its buffer will not be replaced, regardless of it's popularity. Suppose you are running a SQL statement that references a non-popular free buffer? You certainly don't want the buffer to be replaced! The pin prevents the buffer from being replaced.
So, if I am looking for a buffer to replaced, pinned buffers are not an option.
Only One Replacement Option!
Considering the above, our only buffer replacement candidate is a non-popular free buffer. So, when I need a buffer to replace, I will aggressively search for a non-popular free buffer.
How a process goes about looking for a non-popular free buffer? Honestly, that is not the focus of this post.
But I will say this, a server process begins looking for a non-popular by at the LRU end of its associated LRU, working toward the most recently used (MRU) end of the LRU. Generally speaking, the server process will keep asking each buffer header if they are associated with a non-popular free buffer. If the answer is "YES" then the buffer header's buffer can be replaced.
Data Collection Script
Since a buffer's popularity is based on its touch count (x$bh.tch), I wrote some PL/SQL to sample a specific buffer 600 times, once every half a second. This means I'm sampling the chosen buffer for five minutes. With that duration, I should get a interesting picture of the buffer's life!
drop table op_tch_results; create table op_tch_results ( loop_no number, currrent_ts timestamp, file_no number, block_no number, tch number ); truncate table op_tch_results; alter session set commit_wait=nowait; declare looper_var number; dbarfil_var number := 6 ; dbablk_var number := 2566541; tch_var number ; begin for looper_var in 1..600 loop begin select tch into tch_var from x$bh where dbarfil = dbarfil_var and dbablk = dbablk_var; EXCEPTION WHEN OTHERS THEN tch_var := -1; end; insert into op_tch_results values ( looper_var, current_timestamp, dbarfil_var, dbablk_var, tch_var ); dbms_lock.sleep(0.50); end loop; end; commit; /
The only really cool aspect of this script is the EXCEPTION clause. You see, if my chosen buffer is not currently in the buffer cache, without the exception clause, the PL/SQL script will fail! Why? Because no rows where returned from the SELECT statement! So, when the exception fires, it is telling me the buffer is not currently in the buffer cache. To make this numerically and graphically obvious, I decided to set the recorded touch count to -1.
Finding A Suitable Buffer
For this experiment, I need to find an popular buffer... but not too popular. It took me awhile to find one, but I eventually did.
The performance situation I placed on the system was a free buffer wait. Without getting into the details, I know when there are lots of free buffer waits, there is lots of buffer replacement occurring. This is exactly what I wanted. I used my OP Loader toolkit to help place the free buffer wait load on my system.
This is the SQL I used to help find just the right buffer.
select dbarfil, dbablk, tch from x$bh where dbarfil > 1 order by 3;
The "> 1" was added because I did not want a dictionary table buffer and I didn't want a super popular buffer. Keep in mind that one of the ways a buffer's touch count can be reset is when it is promoted. So the most popular buffer can have a touch count of zero!
After a while, I found just the buffer I was looking for. It was file 6 and block 2566541.
Viewing The Touch Count Data
Once the collection script finished, I wanted to quickly look at the touch count data. I wanted to see the touch count change and the buffer being replaced. As you'll recall, "being replaced" is represented by a touch count of -1.
You can view all the data HERE but below is a snippet.
... 5, 5, 5, 5, 0, 1, 1, 1, 1, ... 1, 1, 1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, ... 3, 3, 0, 0, 0, ... 0, 0, 1, 1, 1, ... 1, 1, -1, -1, -1, -1, ... -1, -1, 1, 1, 1,
Numeric Focused Observations
While I am going to briefly describe what happened, for details refer to Buffer Cache Internals chapter in my book, Oracle Performance Firefighting . It's all in there.
Very cool observations when the touch count changed from X to Y:
- 5 -> 0. The buffer header got promoted. When the server process was looking for a non-popular free buffer to replace, it came upon our buffer header. But it noticed it was popular (tch>1) and so it promoted it...but then reset its touch count to zero! (How rude!)
- 0 -> 1. Working its way from the MRU end of the LRU towards the LRU end of the LRU (sorry), the buffer header crossed the mid-pointer of the LRU, which resets the touch count to one.
- 1 -> -1. When the server process was looking for a non-popular free buffer to replace, it came upon our buffer header. Our poor buffer was not popular enough (tch !> 1), so it was replaced by another buffer. (That's gotta hurt!) So at this point, our buffer is no longer in the buffer cache...only on disk in an Oracle database file.
- -1 -> 1. A SQL statement needed to access our block. So it found a non-popular free buffer to replace, made the replacement, moved our buffer's new buffer header to the LRU mid-point, and set its touch count to 1.
- 3 -> 0. The buffer header got promoted. When the server process was looking for a non-popular free buffer to replace, it came upon our buffer header. But it noticed it was popular (tch>1) and so it promoted it...but then reset its touch count to zero! (It's a tough life being an Oracle buffer header!)
Pretty cool, eh? That's a numeric perspective, now let's take a look at the picture!
Graphical Focused Observations
Numbers are great and all, but a picture can really bring a situation to life. Below is a plot of our buffer [header] as it is progresses through its life in the buffer cache.
We first see the buffer when its already in the buffer cache and its buffer header has a touch count of five. From then on, if you study the chart and read my annotations carefully, you can see the fascinating progression of this buffer and its buffer header(s).
Observed Touch-Count Algorithm Characteristics
If you look at the numbers and chart of our buffer over its observed five minutes, we can infer a few characteristics of Oracle's implementation of the touch-count algorithm.
For details about Oracle's touch-count algorithm and its associated instance parameters, please see Chapter 6, Oracle Buffer Cache Internals in my book, Oracle Performance Firefighting.
- When a buffer is brought into the cache, its buffer header's touch-count is set to one.
- When a buffer header is promoted, its touch-count is set to zero.
- When a buffer header crosses the mid-point (moving from the hot region to the cold region), its touch-count is set to one.
- A buffer header's buffer can be replaced if its touch-count is less than two. It also has to be free, but my test does not demonstrate this.
I hope you enjoyed this post. And, it will help you understand the what heck is going on inside Oracle's buffer cache!
All the best in your Oracle performance tuning work!
If you have any questions or comments, feel free to email me directly at craig at orapub.com.
|How To Use An AWR Report To Create Correct Wait Event Based Histograms||A Real Story In A Real Life: Everything Started With A Simple Post||Oracle Database Cache Buffer Chain (CBC) Latches, CPU Consumption, and Wait Time|