Wednesday, June 4, 2008

Oracle Histograms

About Oracle Histograms

Histograms may help the Oracle optimizer in deciding whether to use an index vs. a full-table scan (where index values are skewed) or help the optimizer determine the fastest table join order. For determining the best table join order, the WHERE clause of the query can be inspected along with the execution plan for the original query. If the cardinality of the table is too-high, then histograms on the most selective column in the WHERE clause will tip-off the optimizer and change the table join order.

Most Oracle experts only recommend scheduled re-analysis for highly dynamic databases, and most shops save one very-deep sample (with histograms), storing the statistic with the dbms_stats.export_schema_stats procedure. The only exceptions are highly-volatile systems (i.e. lab research systems) where a table is huge one-day and small the next.

For periodic re-analysis, many shops us the table "monitoring" option and also method_opt "auto" after they are confident that all histograms are in-place.

Oracle histograms statistics can be created when you have a highly skewed index, where some values have a disproportional number of rows. In the real world, this is quite rare, and one of the most common mistakes with the CBO is the unnecessary introduction of histograms in the CBO statistics. As a general rule, histograms are used when a column's values warrant a change to the execution plan.

If you need to reanalyze your statistics, the reanalyze task will be less resource intensive with the repeat option. Using the repeat option will only reanalyze indexes with existing histograms, and will not search for other histograms opportunities. This is the way that you will reanalyze you statistics on a regular basis.

--**************************************************************
-- REPEAT OPTION - Only re-analyze histograms for indexes
-- that have histograms
--
-- Following the initial analysis, the weekly analysis
-- job will use the “repeat” option. The repeat option
-- tells dbms_stats that no indexes have changed, and
-- it will only re-analyze histograms for
-- indexes that have histograms.
--**************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
degree => 7
);
end;
/

Find histograms for foreign key columns - Many DBAs forget that the CBO must have foreign-key histograms in order to determine the optimal table join order (i.e. the ORDERED hint).

Fix the cause, not the symptom - For example, whenever I see a sub-optimal order for table joins, I resist the temptation to add the ORDERED hint, and instead create histograms on the foreign keys of the join to force the CBO to make the best decision.

For new features, explore the Oracle10g automatic histograms collection mechanism that interrogates v$sql_plan to see where the foreign keys are used. It claims to generate histograms when appropriate, all automatically.

This is one reason that the ORDERED hint is so popular, but it has been shown that having liberal column histograms on the table columns can often aid the optimizer in making better execution plans.

In sum, histograms are not just for non-unique column values that are unevenly distributed (skewed), and several noted DBA’s have suggested that more liberal use of histograms will aid the CBO is making better decisions. The dbms_stats “auto” feature detects and builds column histograms, but it has the shortcoming of being too conservative in some cases.

Savvy DBA’s are now experimenting with broad-brush histograms, for all indexes columns. I first heard of this technique from Jeff Maresh (noted data warehouse consultant), who told me that he has taken to creating 10-bucket histograms for all data warehouse table columns. I heard this advice again at the IOUG conference from Arup Nanda (noted author and DBA of the year) and from Mike Ault.

They are abandoning the use of the “auto” option and manually creating 20-bucket histograms across-the-board, and they claim that it can make a huge difference for databases with lots of multi-table joins in he SQL.

I’ve not tried this technique yet, but when three experts make the assertion, I believe that there may be something to the new technique. The only downside, of course, is the time required to gather the column histograms and a small amount of additional storage in the data dictionary.

One exciting feature of dbms_stats is the ability to automatically look for columns that should have histograms, and create the histograms. Multi-bucket histograms add a huge parsing overhead to SQL statements, and histograms should ONLY be used when the SQL will choose a different execution plan based upon the column value.

No comments: