Tuesday, August 19, 2014

Dynamic Remastering versus Dynamic Reconfiguration in Oracle RAC

Often DBAs get themselves confused with two most distinct and significant concepts of RAC -Dynamic Remastering and Dynamic Reconfiguration. In this article I would try to explain these two concepts and will also share useful scripts that can be used in day today administration for finding master of a block or to forcefully enable the desired master for a particular block.

Dynamic Remastering: Before starting discussion on dynamic remastering, you need to be familiar with Global Resource Directory.  GRD basically contains useful information such as data block address, mode (null, shared or exclusive), role(local or global), SCN, past image and current image. Each instance maintains the portion of GRD in its own SGA. In other words, any node which is master of particular object or a block maintains the GRD portion for those blocks in its SGA. GCS in conjunction with GES is actually responsible for updating GRD for a resource in instance's SGA. The objective of dynamic remastering is to master buffer cache resource on the instance where it's mostly accessed.

Mastering of block is decided based on the demand for a block, so basically when demand for particular object or block is high on one node, that instance will become master for that object or block. And this is where dynamic mastering comes into play. For Example, instance A is currently master of object (Table EMP) because there were huge number of read/write requests made by instance A on table EMP. After sometime, other instance B started making such requests very frequently for EMP such that demand for EMP table increased on instance B. In such situations, instance A has to either ship the blocks to instance B or forward the request to other instance which is holding most current copy of the requested blocks. To avoid unnecessary interactions and network traffic, GCS remasters the blocks to instance B.

Dynamic Reconfiguration: Dynamic Reconfiguration is the process of reconfiguring resources of failed nodes among surviving nodes. Dynamic Reconfiguration comes into play when an instance crashes unexpectedly. Below are some of the high level steps due to instance crash when node failure is detected by CSSD daemon.

  1. The enqueue part of GRD is reconfigured among surviving nodes by global enqueue service. This operation occurs relatively faster.
  2. The cache part of GRD is reconfigured and SMON reads the redo log of the failed instance to identify the data blocks that it needs to recover.  Since blocks are shipped over the interconnect, SMON may have to merge the redo logs of the failed instance to overcome holes in the redo.
  3. SMON issues GRD requests to obtain all the data blocks that need recovery. Redo threads of failed instances are merged and applied to the most current versions or past image of the blocks from other instance cache. If past images are not available, then redo logs will be applied after reading the blocks from disk. All blocks become accessible now.
  4. Undo blocks are applied to the database in back ground. You can tune this rollback recovery by setting parameter fast_start_rollback_recovery parameter appropriately. 

Dynamic remastering comes into play here as well as cache resources have to be mastered among surviving nodes the only difference here that remastering was forced due to instance crash but not due to demand for resources.

Bases on the above discussion remastering of an object or block can be triggered mainly due to below three reasons:

  • Manual remastering
  • Resource affinity
  • Instance crash


Lets discuss these criteria for dynamic remastering one by one in detail. 

The scripts and demo used in this article are referred from oracle metalink support and very nice article from Anju Garg in blog (oracleinaction.com).

I have used similar demo in one of my in-house environment with some additional tweaks with the help of oracle metalink support and sharing same in this article. The environment is RAC cluster of 3 nodes (RAC1, RAC2, RAC3) and the experiments were performed on oracle provided SCOTT schema.


First of all, retrieve data_object_id of table EMP using below script:
          
    select owner, data_object_id, object_name 
           from dba_objects 
           where owner = 'SCOTT' 
             and object_name = 'EMP';

OWNER   DATA_OBJECT_ID OBJECT_NAME
———- ————– —————————————
SCOTT                32614                    EMP

 Get File_id and block_id of emp table:

SQL>select empno, dbms_rowid.rowid_relative_fno(rowid), 
                  dbms_rowid.rowid_block_number(rowid) 
          from scott.emp 
           where empno in (1, 2);

     EMPNO DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
———- ———————————— ————————————
      1                                   6                                  120
      2                                   6                                  120

 MANUAL REMASTERING

You can manually remaster an object with oradebug command :
oradebug lkdebug -m pkey <data_object_id>

Step I. shutdown the database and restart it from RAC1

[oracle@rac1 ~]$ srvctl stop database -d rac 
                  srvctl start database -d rac
                  srvctl status database -d rac
Step II. Issue a select on the object from NODE2

SCOTT@NODE2> select * from  emp;

Step III. Find the GCS resource name to be used in  the query
   x$kjbl.kjblname = resource name in hexadecimal format([id1],[id2],[type]
   x$kjbl.kjblname2 = resource name in decimal format
   Hexname will be used to query resource in v$gc_element and v$dlm_rss views.

SYS@rac2>col hexname for a25 
             col resource_name for a15 
             select b.kjblname hexname, b.kjblname2 resource_name, 
                     b.kjblgrant, b.kjblrole, b.kjblrequest  
           from x$le a, x$kjbl b 
             where a.le_kjbl=b.kjbllockp 
              and a.le_addr = (select le_addr 
                                from x$bh 
                               where dbablk = 120 
                                and obj    = 32614 
                               and class  = 1 
                                and state   <> 3);

HEXNAME           RESOURCE_NAME   KJBLGRANT   KJBLROLE KJBLREQUE
————————- ————— ——— ———- —————— ———- ———
[0x97][0x4],[BL]          120,6,BL                 KJUSERPR            0            KJUSERNL

Step IV. Check the current master of the block –

SYS>select o.object_name, m.CURRENT_MASTER, 
                   m.PREVIOUS_MASTER, m.REMASTER_CNT 
          from   dba_objects o, v$gcspfmaster_info m
           where o.data_object_id=32614
           and m.data_object_id = 32614 ;

OBJECT CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT
—— ————– ——————— ————— ————— —
EMP                 0                                      32767                                   1

Note that current master of scott.emp is node RAC1 (numbering starts from 0). Value of previous master (32767) indicates that the previous master was not known and current the master is 0 which is instance RAC1. REMASTER_CNT = 1 indicates the object has been remastered only once.


Step V. Use following SQL to show master and owner of the block.
 This SQL joins   x$kjbl with x$le to retrieve resource name.
– Note that current master is node1(KJBLMASTER=0) and current owner of the block is
node2(KJBLOWNER = 1)

SYS@NODE2> select kj.kjblname, kj.kjblname2, kj.kjblowner, 
                       kj.kjblmaster
            from (select kjblname, kjblname2, kjblowner, 
                         kjblmaster, kjbllockp         
                  from x$kjbl
                   where kjblname = '[0x97][0x4],[BL]'
                  ) kj, x$le le
            where le.le_kjbl = kj.kjbllockp
            order by le.le_addr;

KJBLNAME              KJBLNAME2    KJBLOWNER  KJBLMASTER
—————————— —————————————-
[0x97][0x4],[BL]               120,6,BL                       1             0

Step VI. Manually master the EMP table to node RAC2 using oradebug.

SYS@rac1>oradebug lkdebug -m pkey 32614

SYS>select o.object_name, m.CURRENT_MASTER, 
                   m.PREVIOUS_MASTER, m.REMASTER_CNT 
          from   dba_objects o, v$gcspfmaster_info m 
           where o.data_object_id=32614
            and m.data_object_id = 32614 ;

OBJECT CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT
—— ————– ————— —————— ————– ————— ————
EMP                     1                                             0                        2

As you can clearly see that current_master has changed from node 1 to node 2 and remaster count has also increased to 2. It also show that previously master for the object was node 1 (PREVIOUS_MASTER = 0).

You can also determine the master at block level using below query.
SYS> select kj.kjblname, kj.kjblname2, kj.kjblowner, 
             kj.kjblmaster 
           from (select kjblname, kjblname2, kjblowner, 
                         kjblmaster, kjbllockp 
                 from x$kjbl
                  where kjblname = '[0x97][0x4],[BL]') kj, x$le le 
           where le.le_kjbl = kj.kjbllockp   
           order by le.le_addr;
KJBLNAME                       KJBLNAME2        KJBLOWNER KJBLMASTER
—————————— —————————— ———-  ———-
[0x97][0x4],[BL]               120,6,BL                                1                  1

Note that current owner of the block is Node2 (KJBLOWNER=1)from where query was issued)
Current master of the block has been changed to node2 (KJBLMASTER=1).

REMASTERING DUE TO RESOURCE AFFINITY


Resource affinity allows frequently used resource to be mastered on its local node; it used dynamic resource mastering to move location of the resource masters. There are number of parameters that can be used to dynamically remaster an object.

There may be situations when you would like to tweak or disable dynamic resource mastering for an object or for the whole RAC database. Frequent remastering of any block or object can cause significant traffic in RAC interconnect and cause delays in serving resource requests as GCS has to quiesce the resource for performing DRM. Frequent DRM can result in database spending time mostly on waits such as GCS buffer busy (acquire/release) and other DRM freeze/releases as well as remaster events. These waits can be easily visible in AWR/ASH/AMM or cluster coherency section under OEM . There are ways you can tune this by setting resource affinity attributes. DRM attributes are intentionally undocumented and normally these attributes should not be changed without discussing with Oracle Support.

Following parameters affect dynamic remastering due to resource affinity :
Pls note that these parameters are renamed from 11G onwards as earlier they used to be _gc_affinity_ but 11g onwards they are changed to _gc_policy_

_gc_policy_limit : If an instance opens 50 more open BL requests on an object than the other instance (controlled by _gc_policy_limit parameter), then that object is a candidate for remastering. That object is queued and LMD0 reads the queue and initiates GRD freeze. LMON in conjunction with LMS is responsible for remastering of resources. All these are visible in LMD0/LMON trace files. These statistics are maintained in X$object_affinity_statistics.

_gc_policy_time : Intervals in minutes in which object statistics are evaluated and queue is checked to determine if dynamic remastering must be triggered.

_gc_policy_minimum: The minimum number of global cache operations (OPEN/CONVERT/CLOSE) per minute to qualify an object for affinity (object remastering).

_gc_read_mostly_locking: Setting this parameter to false disable read mostly related DRM. Default is TRUE.

_gc_undo_affinity: This parameter enables dynamic undo remastering. Mastering of undo segments differ from that of non-undo segments. Non-undo segments remastering uses hash techniques and remastering occurs on the basis of number of open BL lock requests while undo segment remastering enabled for a instance that actually activates undo segment which makes sense.

_lm_drm_max_banned_objs: The parameter plays an important role if you want to skip dynamic remastering at object level. Set this parameter to number of objects you want to disable remastering. Maximume value is 235

_lm_drm_banned_objs: Set this parameter to list of object ids separated by commas to skip dynamic remastering for the objects.

Lets consider a scenario where you would like to minimize dynamic remastering for an object. Lets say for an object name EMP, there have been frequent requests coming from nodes RAC1 and RAC2. If number of opens gets beyond default value of 50 for an instance, you would not just want that object to be remastered among nodes just because of that.

So if its normal for multiple instances to issue these many requests, you would prefer to avoid dynamic remastering happening frequently. You can increase the value of _gc_policy_limit to higher than average requests of object per instance as well as you can consider setting higher value of _gc_policy_minimum to control dynamic remastering.

Pls also note that setting higher values or setting above parameters to zero completely disable dynamic remastering. Setting the above parameters also require database restart.

You can use queries described earlier under section "MANUAL REMASTERING" as well as check for number of opens in X$object_policy_statistics to verify if you should consider tuning the parameters discussed above. Sum of sopens, xopens and xfers decides whether the object will be considered for DRM .

SQL> select object, node, sopens, xopens, xfers  from x$object_policy_statistics  where object=
    ( select object_id pe from dba_objects where object_name='EMP' and object_type='TABLE' and owner='SCOTT');
    OBJECT   NODE  SOPENS  XOPENS  XFERS
---------- ---------- ---------- ---------- ---------- ---------- ----------

     32614        2               0            3817         1195

There are other useful views that DBA should always keep checking for tuning the dynamic remastering latency. gv$policy_history which displays DRM history at object level (included from 11g onwards) as well as gv$dynamic_remaster_stats which displays statistical information about dynamic remastering process.

select * from gv$policy_history;
   INST_ID POLICY_EVENT  DATA_OBJECT_ID TARGET_INSTANCE_NUMBER EVENT_DATE
---------- -------------------- -------------- ---------------------- -------------------- --------------------
     1        initiate_affinity         32614                      1                            09/14/2014 13:11:14

     1         push_affinity           32614                      2                            09/14/2014 13:15:54

When DRM triggers, it can easily be visible from LMON and LMD trace files. Alternatively, you can also explicitly turn on the tracing using below:

alter system set events '10430 trace name context forever, level 8';

Diable tracing:
Alter system set events '10430 trace name context off';

 REMASTERING DUE TO INSTANCE CRASH


Presently node RAC2 is the master of SCOTT.EMP. Lets crash this node by shutting it down abnormally and see what's happening:

$srvct stop instance -d rac -i rac2 -o abort

You can clearly see in the output of below query that current_master has changed to 1 i.e. node 2 from previous_master = 2 (node 3) and remaster count has also increased to 5.

SYS@RAC2>select o.object_name, m.CURRENT_MASTER, 
                          m.PREVIOUS_MASTER, m.REMASTER_CNT
                   from   dba_objects o, v$gcspfmaster_info m 
                   where o.data_object_id=32614
                   and m.data_object_id = 32614;

OBJECT_NAME     CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT
————— ————– ————— ————————— ————– ————
EMP                                  1                              2                                5

I hope you find this article useful for you. Pls let me know for any questions or suggestions to make it more understandable for you.

References:
http://oracleinaction.com/dynamic-remastering/
https://support.oracle.com


No comments:

Post a Comment