Thursday, July 31, 2014

Avoid costly 'Table Access By Index Row Ids' with efficient selective Indexes

Hello Guys, This is Azhar Khan currently working as database architect in TCS. I have been thinking of writing certain blogs but didn't find time from my busy schedule to post such useful information. I hope you would find this post useful for you and would be able to utilize this knowledge for query tuning in your environment.

I have been working as DBA for more than seven years and most of the times I heard developer complaining regarding why database queries taking considerable amount of time even if it's picking index. So in this article I would answer why index is not always better option and how to make your index more selective by avoiding table access and reading the data in index itself.

So lets start with an example first where one of my customer had issue with the query performance and how I was able to create more selective index and avoided table access completely.

Here was the initial plan of the query where there was no index and table was undergoing full scan -
SQL> explain plan for SELECT "RECORD_KEY","FIELD_TAG","FIELD_VALUE" FROM "DBUSER"."VQN_PROCEDURE_DATA" "VQN_PROCEDURE_DATA" WHERE "FIELD_TAG"='BATCH_NAME';

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2357431929

------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    | 58245 |  1706K| 17482 |
|*  1 |  TABLE ACCESS FULL| VQN_PROCEDURE_DATA | 58245 |  1706K| 17482 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   1 - filter("FIELD_TAG"='BATCH_NAME')

What would you do to improve the performance of the query? You would simply consider creating index on column which is being used in the where clause which is FIELD_TAG in our case.

After creation of index, execution plan looks below:
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2157065990

---------------------------------------------------------------------------
| Id  | Operation        | Name                   | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                        | 58245 |  1706K| 9347
|    1 |  TABLE ACCESS BY INDEX ROW ID| VQN_PROCEDURE_DATA |  
|*  2 |  INDEX RANGE SCAN| TEMP_IDX_VQN_PROC_DATA |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------

   1 - access("FIELD_TAG"='Test Replicate ID')

In above execution plan even though optimizer determined that using the index would result in reducing the cost over full scan but in actual there was no improvement at all. Even performance of the above query was worse sometimes with the index.

 The reason being poor clustering factor which was resulting in almost reading the same number or more number of blocks by accessing the table using index row id. So in our case this index is not selective. Some times optimizer thinks of not picking the index at all because it estimates that it would take more block visits to access the table using index row id in cases where percentage of data to be fetched on the basis of where clause criteria is quite high and clustering factor of index is poor. So optimizer simply ignores the index and prefers scattered reads.

Now I was left with below three options:
1.) Improve clustering factor by re-organizing the table i.e. recreate the table and organize it by FIELD_TAG. This would result in table data ordered by indexed column FIELD_TAG and same block would not be traversed more than once on accessing the table using index row id but this might affect other indexes which already exist on that table and this could even worsen the clustering factor of other indexes. Moreover we realized that few columns in the tables were of Long Raw data type and there was limitation of re-organizing the table with these data types.
2.) Use parallelism and multi block read count to improve the query performance and not use the index at all. But the query was being executed by multiple users concurrently and we didn't have enough CPU cores for parallelism.
3.) What if I simply avoid table access by index row id? Index leaf blocks contain rowids and data of the indexed column. So, if I include the columns being used in the select clause in the index itself, I could simply find the data in index leaf blocks itself and optimizer doesn't have to access table blocks using index row ids.

I preferred third option in my case and you can see yourself what it did in below execution plan:
create index dbuser.NN_FLD_TG_RC_KY_FLD_VL on VQN_PROCEDURE_DATA(FIELD_TAG,RECORD_KEY,FIELD_VALUE) tablespace UDA nologging parallel 2;
alter index dbuser.NN_FLD_TG_RC_KY_FLD_VL noparallel logging;

SQL> explain plan for SELECT RECORD_KEY,FIELD_TAG,FIELD_VALUE FROM DBUSER.VQN_PROCEDURE_DATA VQN_PROCEDURE_DATA WHERE FIELD_TAG='Test Replicate ID';

Explained.

Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2157065990

---------------------------------------------------------------------------
| Id  | Operation        | Name                   | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                        |   224K|  6577K|     2 |
|*  1 |  INDEX RANGE SCAN| TEMP_IDX_VQN_PROC_DATA |   224K|  6577K|     2 |
---------------------------------------------------------------------------

I was able to reduce cost from 9K to 2 and query was executing in fractions of seconds while earlier it was taking upto 2 minutes for execution. So, idea here is that, you should always select only those columns which are really required in database query rather than using 'select *'. Check if you can include the columns being used in select query in index itself. 

It would also result in significant performance improvement in cases where developer is not at all using the where clause and selecting few columns of large table having large number of columns. This would allow you to retrieve the data from index itself avoiding reading larger number of table blocks as index size for the few columns would be comparatively smaller and even if you are doing fast full index scan, you are visiting less number of blocks. That's the tuning goal - "minimize number of blocks visit".

In my next article, I would give an overview of various types of access paths and join methods in an optimizer and how can you determine if oracle optimizer is in fact using correct access path and joins methods. Thanks for reading my blog and please feel free to post any questions if you have.