Troubleshooting Oracle VDB Performance - Analyze Oracle Event Information (KBA1669)
Oracle Event Information
- AWRs
- MOATS
- SQL Tracing
What is the Oracle wait-event interface?
The Oracle wait-event interface is the instrumentation of the Oracle database kernel to capture elapsed time expended in during calls to the underlying operating, during the use of synchronization mechanisms such as enqueues and latches, and other measurable events. The summarization of event metrics is a powerful tool for understanding Oracle performance.
What is AWR?
AWR stands for Automatic Workload Repository, which is an always on feature of Oracle Database Enterprise Edition; it is not part of Oracle Database Standard Edition. Oracle documentation (11.2 docs for example) are a good introduction to how AWR collects historical workload information and how AWR reports can display this information in different ways.
AWR contains historical workload information, so it is an excellent method for diagnosing performance problems which have occurred in the past. It can certainly be used for diagnosing performance problems which are happening right this second, but that situation allows many more methods to be used. Please also read the section below entitled Use Oracle Enterprise Manager, MOATS, or SQL tracing to view what is happening right now for such scenarios.
Gathering information using AWR "snapshots"
In Oracle Database Enterprise Edition, AWR is always on, so snapshots are gathering information at regular interval which is configured by your database administrator. The default interval is one hour, but can be changed. Seldom is the interval changed to be less frequent than one hour, but it is not recommended to change the interval to be more frequent than 15 minutes, since it might take 30-90 seconds to complete a snapshot on busy systems.
Information within AWR is retained for a duration which is configured by your database administrator. The default retention for AWR is 7 days, but can be changed. Most experienced Oracle DBAs retain AWR information for 45 days (to span at least two month-end periods) or longer.
Generating an AWR report
Most Oracle DBAs will be quite familiar with generating AWR reports, but AWR reports are usually generated either from Oracle Enterprise Manager or from the UNIX/Linux command-line.
Generating an AWR report using Oracle Enterprise Manager
Though the navigation details may change slightly from Oracle Enterprise Manager 10g, 11g, 12c, and 13c, the basic navigation path is to go to Targets > Databases, then select the database in question. Then, choose Performance > AWR > AWR Report, and specify the beginning point (snapshot) for the report and then the end point (snapshot), then finally click OK.
Generating an AWR report using the UNIX/Linux command-line
An AWR report can by connecting as a SYSDBA account to SQL*Plus on the target database host/server, then running the following SQL*Plus script...
|
If the database is RAC, then it might be advisable to run the global RAC version of the report...
|
For either version of the report, you will be prompted to have either a text or HTML report generated, the beginning and ending snapshot ID, and the name of the file into which the report will be generated.
Analyzing an AWR report
An AWR report has an enormous amount of information. The bottom of the AWR shows the init.ora (initialization) parameters. When comparing workloads between VDBs and baseline databases, ensure that initialization parameter values are similar. Specifically, values related to memory (sga_max_size, sga_target, pga_aggegate_target, db_cache_size etc) should be very close or can impact performance significantly.
Next, we can take a look at the section titled "Top 5 Timed Foreground Events" to get an idea of the latencies in the foreground database sessions, which are those sessions servicing application users. Since Delphix manages all I/O for the database files (i.e. datafiles, controlfiles, online redo logfiles, archived redo logfiles), we are concerned with events in thewait class of either "User I/O" or "System I/O". Events in the wait classes of "Concurrency", "Cluster", "Application", "Network", and others usually involve either synchronization/locking within the database instance or time consumed in the database client application, and so are unlikely to be caused by Delphix, but anything related to I/O, specifically high I/O latency, might involve Delphix.
The items in the "Top 5 Timed Foreground Events" section of the AWR report are sorted by the column "% DB Time", which is the total portion of the all elapsed time measured within the database for an event.
High I/O latency is usually indicated by higher values in the column identified as "Avg wait (ms)". For random-access single-block reads (a.k.a. event "db file sequential read") which are characteristic of indexed data access, values under 10 ms are usually considered acceptable, but the definition of "acceptable" certainly varies. For sequential-access multi-block reads (a.k.a. events "db file scattered read" for buffered access and "direct path read" for unbuffered access) characteristics of FULL table scans, values can vary dependent on the number of blocks read with each access, but usually values less than 30 ms are considered acceptable.
High I/O latency is NOT indicated by low values in the "Avg wait (ms)" column being made significant by high values in the "Waits" column. High numbers of waits are caused by high number of I/O requests generated by the application code, and are thus likely a problem in the application, not in the database or the I/O subsystem. The "Top 5 Timed Foreground Events" section of the AWR report shown below displays this situation...
Here, we are seeing acceptable values for latency (i.e. 2 ms for "direct path read", 4 ms for "db file sequential read, 15 ms for "db file scattered read") but the high numbers of waits (a.k.a. cache misses resulting in I/O requests for "db file sequential read" and "db file scattered read", I/O requests for "direct path read") indicate that the SQL statement(s) in the application might need tuning, not the I/O subsystem, which includes Delphix.
Of course, it might be true that the observed latency values (i.e. 2 ms for "direct path read", 4 ms for "db file sequential read, 15 ms for "db file scattered read") are not acceptable at all, and so it might be necessary to troubleshoot further. The situation may differ depending on provisioned resources, the sharing of those resources, and expectations for performance.
Use Oracle Enterprise Manager, MOATS, or SQL tracing to view what is happening right now
For performance problems within VDBs that are happening right now, there are numerous excellent alternatives for analyzing whether poor performance is caused by high I/O latencies potentially involving Delphix. Three of the best alternatives are described briefly here...
Oracle Enterprise Manager's "Average Active Sessions" display
Though the navigation details may change slightly from Oracle Enterprise Manager 10g, 11g, 12c, and 13c, the basic navigation path is to go to Targets > Databases, then select the database in question. Then, choose Performance > Top Activity. The display is quite self-explanatory, and events belonging to the wait class of "User I/O" are displayed in blue. If you see a lot of blue, then you will want to drill down and determine whether it is due to high I/O latencies or due to high numbers of I/O requests.
MOATS
MOATS (a.k.a. Mother Of All Tuning Scripts) are publicly-available scripts created by Adrian Billington and Tanel Poder which are replicate much of the information provided in the OEM "Average Active Sessions" display, but in character-mode ASCII art on a dumb terminal.
Links for MOATS include:
- Source in zip file: https://github.com/oracle-developer/moats
- Explanatory blog post: http://blog.tanelpoder.com/2011/03/29/moats-the-mother-of-all-tuning-scripts/