Oracle Database Singular Cache Buffer Chain Latch Acquisition Pattern Diagnosis
In my previous posting on this topic I focused on determining Oracle Database cache buffer chain (CBC) wait time acquisition patterns; disperse or singular. A disperse pattern occurs when many CBC latches are active. In contrast, there are situations when only one or a few CBC child latches are extremely active, hence the singular pattern. In my previous posting, I also listed some disperse pattern solutions.
In this posting I'm focusing on the additional diagnostic steps needed before a specific solution can be determined with a singular wait acquisition pattern.
Important: To understand what I write below you need a basic understanding of Oracle buffer cache internals. In particular the buffer, latch, child latch, cache buffer chain, and buffer header. These are described (including nice diagrams) in my posting here. In that posting a visualization tool is also used which you can download for free here.
Situations with singular CBC acquisition patterns
Two of the most common popular buffer concurrency situations that can stress a single CBC latch are:
First, when a single block contains some application reference information and is not being changed often. The popular buffer causes its hash/cache buffer chain to be popular and then also the CBC child latch covering the hash buffer chain. If concurrency is high enough, this can become an issue. However, if the buffer is being changed often, we'd likely see a buffer busy wait.
Second, relates to a popular index root block. Any time an index is accessed its single root block must first also be accessed, which repeatedly forces the question, "Is this index root block in the buffer cache?". If the concurrency becomes high enough, eventually you'll see CBC latch contention associated with a single CBC child latch.
There are other singular CBC access situations. Next I'll present how to determine the acquisition pattern.
Diagnosing the acquisition pattern
We need to determine if the CBC child latch access pattern is disperse or singular. A way to do this is shown the SQL below, which is essentially my latchchild.sql OSM script. The script collects access details for latch number 150 (the CBC latch) over a 300 second period. Run the script a few times (i.e., collect a few sample sets) to ensure what you think is occurring actually is occurring.
def latch=150 def sleep=300 drop table op_interim / create table op_interim as select addr,gets,sleeps from v$latch_children where latch# = &latch / exec dbms_lock.sleep(&sleep); select t1.addr, t1.gets-t0.gets delta_gets, t1.sleeps-t0.sleeps delta_sleeps from op_interim t0, ( select addr,gets,sleeps from v$latch_children where latch# = &latch ) t1 where t0.addr = t1.addr order by 3,2 / ADDR DELTA_GETS DELTA_SLEEPS -------- ------------------ ---------------- ... 76C33818 19,8038 3 76C644A8 8,1535 4 76C7759C 9,6993 4 76C2D1CC 14,5096 4 76E03FEC 14,9355 4 76CDD394 16,0718 4 76C68904 18,2300 4 76D69374 6,5250 5 76D7800C 13,3134 5 76DA2650 15,6578 5 76C02B88 15,8293 5 76CE59E0 14,5169 7 76C7B9F8 15,8243 7 76CFF120 6,5228 9 76CFCF30 14,8187 11 76D38668 96,6345 62 76CDF508 98,1384 96 76DAF26C 1,96,2752 187 1024 rows selected.
Figure 1 shows three CBC child latches received a high proportional amount of activity, especially the 76DAF26C child latch. Therefore, we'll focus on the three child latches with addresses 76DAF26C, 76DF508, and 76D38668.
Determine the hot buffer(s) and buffer header(s)
More precisely, we need to determine the hot buffer headers that are on the CBC chains protected by the unusually active CBC child latch(es). The x$bh performance table contains information about each buffer header that has an associated buffer in the buffer cache. (More about buffer headers) There are four columns of particular interest to us:
HLADDR is the "hash latch address" hence the column name hladdr. This is a foreign key to the v$latch_children view's addr column.
FILE# is the buffer header's (also Oracle block and buffer) file number that links to many performance views and tables, such as dba_data_files.file_id and dba_segments.header_file.
DBABLK is the buffer header's (also Oracle block and buffer) block number which may be referenced in dba_segments.header_block.
TCH is the buffer header's touch count number, which is a popularity indicator...with a few twists I'll detail below. In every Oracle release I have checked (including 11.2), the tch column is not in v$bh, only x$bh.
Keep in mind that each child latch will normally "cover" multiple cache buffer chains (perhaps >100) and each chain can have zero or more associated buffer headers (average chain length is usually 0 to 1). And of course each buffer header is related to a cached Oracle block residing in the buffer cache.
In the report shown in Figure 1, we saw which CBC child latches are relatively busy; 76DAF26C, 76DF508, and 76D38668. Now we need to know which CBC chains and buffers are related to our relatively active CBC child latches. By querying x$bh we can easily determine the buffer headers associated with a given CBC child latch. We can also get a clue as to the buffer header's popularity. The SQL below is once such query.
SQL> l 1 select hladdr, file#, dbablk, tch 2 from x$bh 3 where hladdr in ('76DAF26C','76CDF508','76D38668') 4* order by 4 SQL> / HLADDR FILE# DBABLK TCH -------- ---------- ---------- ---------- ... 76D38668 1 70197 5 76D38668 1 39365 5 76DAF26C 1 117328 185 76CDF508 1 117329 186 47 rows selected.
As I wrote many years ago in my touch count algorithm paper and detailed in my Oracle Performance Firefighting book, Oracle uses the "touch count" algorithm to essentially tag popular buffers. There is a little twist though... a buffer's touch count can get reset to zero. So to truly determine the popular buffers, we must repeatedly sample x$bh.
The SQL below can be used to repeatedly sample x$bh finding the most popular buffers given their CBC latch address. If you look closely at the SQL, you'll notice it collects and stores 300, 1 second interval x$bh samples. If you use a larger sleep time, you'll want to increase the number of samples collected. The final select statement reports the key tch based popularity statistics.
drop table op_interim; create table op_interim (hladdr raw(4), file# number, block# number, tch number); declare i number; begin for i in 1..300 loop insert into op_interim select hladdr,file#, dbablk, tch from x$bh where hladdr in ('76DAF26C','76CDF508','76BE93CC'); dbms_lock.sleep(1); end loop; end; / select hladdr, file#, block#, count(*) count, min(tch) min, median(tch) med, round(avg(tch)) avg, max(tch) max from op_interim group by hladdr, file#, block# order by 7 / HLADDR FILE# BLOCK# COUNT MIN MED AVG MAX -------- ------ ---------- ---------- ---------- ---------- ---------- ---------- ... 76BE93CC 1 39364 300 6 6 6 6 76BE93CC 1 69730 300 13 14 14 14 76CDF508 1 117329 600 1 36 64 181 76DAF26C 1 117328 300 70 125 125 180 76BE93CC 4 1552339 300 76 131 131 186 42 rows selected.
The statistics that I'm most interested in are the median (MED) and the maximum (MAX). And I'm hoping there is a clear buffer header or very few buffer headers that are relatively really, really active (i.e., hot).
As a side note, you may have noticed that one of the COUNTS is 600 while the others are 300. This could have occurred because there are two buffer headers related to buffer 1,117329. Perhaps one could be the current mode (CU) buffer and the other a consistent read (CR) buffer. The above SQL could hbe improved by adding the state column to more uniquely identify a buffer header by its file#, block#, and state...but I digress.
Based on the above SQL output, we know the hot buffers (file#, block#) are: (1,117328) and (4,1552339). You could also argue to include 1,117329 but this buffer header is not consistently hot because its median is much lower than the other two. But in all honestly, if this was a real production system and because the above SQL could be improved, I would investigate. Now we need to understand why these two buffer headers are so popular.
Determine the hot buffer details
I will investigate the two most popular buffers. But honestly, my focus in this posting is on the second buffer; 4,1552339.
Investigating Buffer 1, 117328
To determine the segment's name and type, I'm going to use my dba_extents based OSM script, objfb.sql. The SQL is a little tricky, so you may want to check it out. Let's first look at block 1,117328.
SQL> @objfb 1 117328 Database: prod18 18-NOV-11 09:45am Report: objfb.sql OSM by OraPub, Inc. Page 1 Object Details For A Given File #(1) and block #(117328) File number :1 Block number :117328 Owner :SYSTEM Segment name :OP_LOAD_PARAMS Segment type :TABLE Tablespace :SYSTEM File name :/u01/oradata/prod18/system01.dbf
OK, so we're dealing with a table. But this is strange because the op_load_params table is used to interactively change the load intensity of my workload generator tool. (You can download an older version here. I haven't posted the latest version... just lazy I guess.)
When I'm investigating hot table buffers, I also check if the buffer is the segment header block (header blocks contain special stuff...sorry to be so non-specific... but I digress). To determine if the buffer is a segment header block, I ran the below code snippet:
SQL> l 1 select owner, segment_name, segment_type 2 from dba_segments 3 where header_file=&hdr_file 4* and header_block=&hdr_block SQL> / Enter value for hdr_file: 1 Enter value for hdr_block: 117328 OWNER SEGMENT_NAME SEGMENT_TY ---------- ------------------------- ---------- SYSTEM OP_LOAD_PARAMS TABLE
Since a row was returned, this popular buffer header is indeed the op_load_params table header block! This is not what I expected and not the focus of this blog posting... so I'll move on. But if this was a production system, you better believe I would figure it out!
Investigating Buffer 4, 1552339
Now let's turn our attention to the other hot buffer header, namely buffer header 4,1552339. First I'll determine the object type by running my objfb.sql script.
SQL> @objfb 4 1552339 Database: prod18 18-NOV-11 09:51am Report: objfb.sql OSM by OraPub, Inc. Page 1 Object Details For A Given File #(4) and block #(1552339) File number :4 Block number :1552339 Owner :MG Segment name :SPECIAL_CASES_BOGUS Segment type :INDEX Tablespace :USERS File name :/u01/oradata/prod18/users01.dbf 1 row selected.
Very interesting... an index. So we know this hot buffer header is an index block. I'm willing to bet it's the index's root block! Why? Because every time an index is accessed, it's root block buffer header (and buffer) is also accessed. And a very active index root block buffer header can cause its hash chain to be very active which can cause problems when a process attempts to acquire the hash chain's child latch. But how can we tell if an index block is the root block?
As I wrote in my November 11, 2011 posting, we can expect an index's root block to have a block number that is one greater than its segment header block. If you look closely at the SQL below, the block number I entered is one less then the popular block (1552338 = 1552339-1). If this 1552339 block is indeed an index root block, then it's segment header will have a block number of 1552338... let's check it out!
SQL> / Enter value for hdr_file: 4 Enter value for hdr_block: 1552338 OWNER SEGMENT_NAME SEGMENT_TY ---------- ------------------------- ---------- MG SPECIAL_CASES_BOGUS INDEX 1 row selected. SQL> l 1 select owner, segment_name, segment_type 2 from dba_segments 3 where header_file=&hdr_file 4* and header_block=&hdr_block SQL>
Fantastic! A row was returned, which means the popular buffer header is indeed the index's root block!
So the initial diagnosis is complete and now we need a solution. A poor solution would be to increase the number of CBC latches. Adding CBC latches does indeed significantly help during a disperse CBC latch acquisition pattern because each latch covers fewer chains. But when there exists a singular CBC acquisition pattern, the additional latches won't help much. What we need to do is somehow make the popular index root block buffer header less popular. One solution is to hash partition the index, which effectively creates multiple root blocks because each partition has a root block. At this point we have our diagnosis and a solution.
This posting has a number of objectives:
- How to identify CBC child latch acquisition patterns; disperse or singular
- How to determine the hot buffer headers related to specific CBC child latches
- What information is needed to further diagnose the hot buffer header
- Focus on the situation (diagnosis and solution) when the hot buffer header is an index root block
This is one of my more complicated (and perhaps confusing) posts because to really understand what I'm getting at, you must have a good understanding of Oracle buffer cache internals.
Personally, the hot index root block situation is particularly satisfying. It's a very real application of when the top wait event is "latch: cache buffer chains" simply increasing the number of CBC child latches will not significantly improver performance. But with just a couple of extra diagnostic steps, we can nail the core Oracle contention area.
It took a while to get there but I hope you enjoyed the ride!
Thanks for reading!
If you have any questions or comments, feel free to email me directly at craig at orapub.com.
|Speed Sometimes Means Changing The Rules||IOUG 2016 Is Different: Here's How To Get Your Abstract Accepted||How Oracle Database Mutexes Sleep|