Monday, June 2, 2008

Oracle Database Architecture: -Part2

Trace Files

Trace files are a source of debugging information. When the server encounters a problem, it generates a trace file full of diagnostic information. When a developer sets SQL_TRACE=TRUE, the server generates a trace file full of performance-related information. Trace files are available to us because Oracle is a heavily instrumented piece of software. By “instrumented,” I mean that the programmers who wrote the database kernel put in debugging code — lots and lots of it. And they left it in, on purpose.

I’ve met many developers who consider debugging code to be overhead — something that must be ripped out before it goes into production in a vain attempt to squeeze every ounce of performance out of the code. Later, of course, they discover that their code has “a bug” or it “isn’t running as fast as it should” (which end users tend to call “a bug” as well. To an end user, poor performance is a bug!). At that point, they are really wishing that the debug code was still in there (or had been in there if it never was), especially since they cannot drop debug code into the production system — that is an environment where new code must be tested first, not something you do at the drop of a hat.

The Oracle database (and Application Server and Oracle applications) is heavily instrumented. Signs of this instrumentation in the database are

      • V$ views: Most V$ views contain “debug” information. V$WAITSTAT, V$SESSION_EVENT, and many others are there solely to let us know what is going on in the bowels of the kernel.
      • The auditing command: This command allows you to specify what events the database should record for later analysis.
      • Resource Manager (DBMS_RESOURCE_MANAGER): This feature allows you to micromanage resources (CPU, I/O, and the like) within the database. What makes a Resource Manager in the database a possibility is the fact that it has access to all of the runtime statistics describing how the resources are being used.
      • Oracle “events”: These provide the ability for you to ask Oracle to produce trace or diagnostic information as needed.
      • DBMS_TRACE: This facility within the PL/SQL engine exhaustively records the call tree of stored procedures, exceptions raised, and errors encountered.
      • Database event triggers: These triggers, such as ON SERVERERROR, allow you to monitor and log any condition you feel is “exceptional” or out of the ordinary. For example, you can log the SQL that was running when an “out of temp space” error was raised.
      • SQL_TRACE: The SQL Trace facility is also available in an extended fashion via the 10046 Oracle event.

. . . among others. Instrumentation is vital in application design and development, and the Oracle database becomes better instrumented with each release. In fact, the amount of additional instrumentation in the database between Oracle9i Release 2 and Oracle 10g Release 1 itself is phenomenal. Oracle 10g took code instrumentation in the kernel to a whole new level.

In this section we’re going to focus on the information that can be found in various types of trace files. We’ll cover what they are, where they are stored, and what we can do with them.

There are generally two types of trace file, and what we do with each kind is very different:

      • Trace files you expected and want: For example, these are the result of enabling SQL_TRACE=TRUE. They contain diagnostic information about your session and will help you tune your application to optimize its performance and diagnose what bottlenecks it is experiencing.
      • Trace files you were not expecting to receive but the server generated as the result of an ORA-00600 “Internal Error”, ORA-03113 “End of file on communication channel”, or ORA-07445 “Exception Encountered” error: These traces contain diagnostic information that is most useful to an Oracle Support analyst and, beyond showing us where in our application the internal error was raised, are of limited use to us.

Requested Trace Files

The trace files you expect to be most commonly generated as the result of setting SQL_TRACE=TRUE, or using the extended trace facility via the 10046 event, are as follows:

ops$tkyte@ORA10G> alter session set events
2 '10046 trace name context forever, level 12';
Session altered.

File Locations

Whether you use SQL_TRACE or the extended trace facility, Oracle will start generating a trace file on the database server machine in one of two locations:

      • If you are using a dedicated server connection, the trace file will be generated in the directory specified by the USER_DUMP_DEST parameter.
      • If you are using a shared server connection, the trace file will be generated in the directory specified by the BACKGROUND_DUMP_DEST parameter.

To see where the trace files will go, you may either issue SHOW PARAMETER DUMP_DEST from SQL*Plus or query the V$PARAMETER view directly:

ops$tkyte@ORA10G> select name, value
2 from v$parameter
3 where name like '%dump_dest%'
4 /

NAME VALUE
------------------------------ -------------------------------
background_dump_dest /home/ora10g/admin/ora10g/bdump
user_dump_dest /home/ora10g/admin/ora10g/udump
core_dump_dest /home/ora10g/admin/ora10g/cdump

This shows the three dump (trace) destinations. Background dump destination is used by any “server” process (see Chapter 5 for a comprehensive list of Oracle background processes and their functions).

If you are using a shared server connection to Oracle, you are using a background process; hence the location of your trace files is defined by BACKGROUND_DUMP_DEST. If you are using a dedicated server connection, you are using a user or foreground process to interact with Oracle; hence your trace files will go in the directory specified by the USER_DUMP_DEST parameter. The CORE_DUMP_DEST parameter defines where a “core” file would be generated in the event of a serious Oracle internal error (such as a segmentation fault on UNIX) or if Oracle Support were to have to you generate one for additional debug information. In general, the two destinations of interest are the background and user dump destinations. As a note, unless otherwise stated, we will be using dedicated server connections in the course of this book.

In the event you do not have access to the V$PARAMETER view, you may use DBMS_UTILITY to access the values of most (but not all) parameters. The following example demonstrates that all you need is the CREATE SESSION privilege in order to, at the very least, see this information:

ops$tkyte@ORA10G> create user least_privs identified by least_privs;
User created.

ops$tkyte@ORA10G> grant create session to least_privs;
Grant succeeded.

ops$tkyte@ORA10G> connect least_privs/least_privs
Connected.
least_privs@ORA10G> declare
2 l_string varchar2(255);
3 l_dummy number;
4 begin
5 l_dummy := dbms_utility.get_parameter_value
6 ( 'background_dump_dest', l_dummy, l_string );
7 dbms_output.put_line( 'background: ' || l_string );
8 l_dummy := dbms_utility.get_parameter_value
9 ( 'user_dump_dest', l_dummy, l_string );
10 dbms_output.put_line( 'user: ' || l_string );
11 end;
12 /
background: /home/ora10g/admin/ora10g/bdump
user: /home/ora10g/admin/ora10g/udump

PL/SQL procedure successfully completed.

Naming Convention

The trace file naming convention changes from time to time in Oracle, but if you have an example of a trace file name from your system, it is easy to see the template in use. For example, on my various servers, a trace file name looks as shown in Table 3-1.

Trace File Name Platform Database Version
ora10g_ora_24574.trc Linux 10g Release 1
ora9ir2_ora_24628.trc Linux 9i Release 2
ora_10583.trc Linux 9i Release 1
ora9ir2w_ora_688.trc Windows 9i Release 2
ora10g_ora_1256.trc Windows 10g Release 1

Table 3-1. Sample Trace File Names

On my servers, the trace file name can be broken down as follows:

      • The first part of the file name is the ORACLE_SID (with the exception of Oracle9i Release 1, where Oracle decided to leave that off).
      • The next bit of the file name is just ora.
      • The number in the trace file name is the process ID of your dedicated server, available to you from the V$PROCESS view.

Therefore, in practice (assuming dedicated server mode), you need access to four views:

      • V$PARAMETER: To locate the trace file for USER_DUMP_DEST
      • V$PROCESS: To find the process ID
      • V$SESSION: To correctly identify your session’s information in the other views
      • V$INSTANCE: To get the ORACLE_SID

As noted earlier, you can use DBMS_UTILITY to find the location, and often you simply “know” the ORACLE_SID, so technically you need access to V$SESSION and V$PROCESS only, but for ease of use you would want access to all four.

A query, then, to generate your trace file name would be

ops$tkyte@ORA10G> alter session set sql_trace=true;
Session altered.

ops$tkyte@ORA10G> select c.value || '/' || d.instance_name ||
2 '_ora_' || a.spid || '.trc' trace
3 from v$process a, v$session b, v$parameter c, v$instance d
4 where a.addr = b.paddr
5 and b.audsid = userenv('sessionid')
6 and c.name = 'user_dump_dest'
7 /

TRACE
-------------------------------------------------------------------------------
/home/ora10g/admin/ora10g/udump/ora10g_ora_24667.trc

ops$tkyte@ORA10G>

It should be obvious that on Windows you would replace the / with \. If you are using 9i Release 1, you would simply issue the following, instead of adding the instance name into the trace file name:

select c.value || 'ora_' || a.spid || '.trc'

Tagging Trace Files

There is a way to “tag” your trace file so that you can find it even if you are not permitted access to V$PROCESS and V$SESSION. Assuming you had access to read the USER_DUMP_DEST directory, you can use the session parameter TRACEFILE_IDENTIFIER. Using this, you may add a uniquely identifiable string to the trace file name, for example:

ops$tkyte@ORA10G> alter session set tracefile_identifier = 'Look_For_Me';
Session altered.

ops$tkyte@ORA10G> alter session set sql_trace=true;
Session altered.

ops$tkyte@ORA10G> !ls /home/ora10g/admin/ora10g/udump/*Look_For_Me*
/home/ora10g/admin/ora10g/udump/ora10g_ora_24676_Look_For_Me.trc

ops$tkyte@ORA10G>

As you can see, the trace file is now named in the standard _ora_ format, but it also has the unique string we specified associated with it, making it rather easy to find “our” trace file name.

Trace Files Generated in Response to Internal Errors

I’d like to close this section with a discussion about those other kinds of trace files — the ones we did not expect that were generated as a result of an ORA-00600 or some other internal error. Is there anything we can do with them?

The short answer is that in general, they are not for you and me. They are useful to Oracle Support. However, they can be useful when we are filing an iTAR with Oracle Support. That point is crucial: if you are getting internal errors, then the only way they will ever be corrected is if you file an iTAR. If you just ignore them, they will not get fixed by themselves, except by accident.

For example, in Oracle 10g Release 1, if you create the following table and run the query, you may well get an internal error (or not — it was filed as a bug and is corrected in later patch releases):

ops$tkyte@ORA10G> create table t ( x int primary key );
Table created.

ops$tkyte@ORA10G> insert into t values ( 1 );
1 row created.

ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.

ops$tkyte@ORA10G> select count(x) over ()
2 from t;
from t
*
ERROR at line 2:
ORA-00600: internal error code, arguments: [12410], [], [], [], [], [], [], []

Now, you are the DBA and all of a sudden this trace file pops up in the user dump destination. Or you are the developer and your application raises an ORA-00600 error and you want to find out what happened. There is a lot of information in that trace file (some 35,000 lines more in fact), but in general it is not useful to you and me. We would generally just compress the trace file and upload it as part of our iTAR processing.

However, there is some information in there that can help you track down the “who,” “what,” and “where” of the error, and also help you find out if the problem is something others have experienced — many times, the “why” — on http://metalink.oracle.com. A quick inspection of the very top of the trace file will provide you with some useful information, such as

Dump file c:\oracle\admin\ora10g\udump\ora10g_ora_1256.trc
Sun Jan 02 14:21:29 2005
ORACLE V10.1.0.3.0 - Production vsnsta=0
vsnsql=13 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 1 - type 586
Process Affinity: 0x00000000
Memory (A/P) : PH:11M/255M, PG:295M/1002M, VA:1605M/2047M
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 21
Windows thread id: 1256, image: ORACLE.EXE (SHAD)

The database information is important to have when you go to http://metalink.oracle. com to file the iTAR, of course, but it is also useful when you go to search http://metalink. oracle.com to see if this is a known problem. In addition, you can see the Oracle instance on which this error occurred. It is quite common to have many instances running concurrently, so isolating the problem to a single instance is useful.

*** 2005-01-02 14:21:29.062
*** ACTION NAME:() 2005-01-02 14:21:28.999
*** MODULE NAME:(SQL*Plus) 2005-01-02 14:21:28.999
*** SERVICE NAME:(SYS$USERS) 2005-01-02 14:21:28.999

This part of the trace file is new with Oracle 10g and won’t be there in Oracle9i. It shows the session information available in the columns ACTION and MODULE from V$SESSION. Here we can see that it was a SQL*Plus session that caused the error to be raised (you and your developers can and should set the ACTION and MODULE information; some environments such as Oracle Forms and HTML DB do this already for you).

Additionally, we have the SERVICE NAME. This is the actual service name used to connect to the database — SYS$USERS, in this case — indicating we didn’t connect via a TNS service. If we logged in using user/pass@ora10g.localdomain, we might see

*** SERVICE NAME:(ora10g) 2005-01-02 15:15:59.041

where ora10g is the service name (not the TNS connect string; rather, it’s the ultimate service registered in a TNS listener to which it connected). This is also useful in tracking down which process/module is affected by this error.

Lastly, before we get to the actual error, we can see the session ID and related date/time information (all releases) as further identifying information:

*** SESSION ID:(146.2) 2005-01-02 14:21:28.999

Now we are ready to get into the error itself:

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [12410], [], [], [], [], [], [], []
Current SQL statement for this session:
select count(x) over ()
from t
----- Call Stack Trace -----
_ksedmp+524
_ksfdmp.160+14
_kgeriv+139
_kgesiv+78
_ksesic0+59
_qerixAllocate+4155
_qknRwsAllocateTree+281
_qknRwsAllocateTree+252
_qknRwsAllocateTree+252
_qknRwsAllocateTree+252
_qknDoRwsAllocate+9
...

Here we see a couple of important pieces of information. First, we find the SQL statement that was executing when the internal error was raised, which is very useful for tracking down what application(s) was affected. Also, since we see the SQL here, we can possibly start investigating possible work-arounds — trying different ways to code the SQL to see if we can quickly work around the issue while working the bug. Furthermore, we can cut and paste the offending SQL into SQL*Plus and see if we have a nicely reproducible test case for Oracle Support (these are the best kinds of test cases, of course).

The other important pieces of information are the error code (typically 600, 3113, or 7445) and other arguments associated with the error code. Using these, along with some of the stack trace information that shows the set of Oracle internal subroutines that were called in order, we might be able to find an existing bug (and work-arounds, patches, and so on). For example, we might use the search string

ora-00600 12410 ksesic0 qerixAllocate qknRwsAllocateTree

Using MetaLink’s advanced search (using all of the words, search the bug database), we immediately find the bug 3800614, “ORA-600 [12410] ON SIMPLE QUERY WITH ANALYTIC FUNCTION”. If we go to http://metalink.oracle.com and search using that text, we will discover this bug, see that it is fixed in the next release, and note that patches are available — all of that information is available to us. I find many times, the error I receive is an error that has happened before and there are in fact fixes or work-arounds for it.

Trace File Wrap-Up

You now know the two types of general trace files, where they are located, and how to find them. Hopefully, you’ll use trace files mostly for tuning and increasing the performance of your application, rather than for filing iTARs. As a last note, Oracle Support does have access to many undocumented “events” that are very useful for dumping out tons of diagnostic information whenever the database hits any error. For example, if you believe you are getting an ORA-01555 when you absolutely feel you should not be, Oracle Support can guide you through the process of setting such diagnostic events in order to help you track down precisely why that error is getting raised, by creating a trace file every time that error is encountered.

No comments: