Saturday, June 27, 2026

How to Read an Oracle AWR Report Step by Step: A Senior DBA's Field Guide

How to Read an Oracle AWR Report Step by Step: A Senior DBA's Field Guide

Oracle AWR report step by step analysis showing load profile and top SQL sections

If you've been an Oracle DBA for more than six months, someone has probably dropped an AWR report in your lap and said, "the database is slow — fix it." Reading an Oracle AWR report step by step is one of the most important skills you can develop, and yet most online guides barely scratch the surface. After two decades of doing this on production systems ranging from small OLTP shops to multi-terabyte data warehouses, I want to walk you through the sections that actually matter — in the order you should read them — and show you what good versus bad looks like in each one.

What Is an AWR Report and How Do You Generate One?

The Automatic Workload Repository (AWR) captures database statistics in snapshots, by default every 60 minutes. A report compares two snapshots and summarizes what happened in between. It is the single most powerful diagnostic artifact Oracle provides, and you should know how to generate one without thinking.

To generate an AWR report from SQL*Plus:

-- Run as SYSDBA or a user with DBA privilege
@$ORACLE_HOME/rdbms/admin/awrrpt.sql

The script prompts for report format (HTML or text — always choose HTML for readability), the number of days to look back, and then asks you to select a begin and end snapshot. Pick a snapshot window that covers the problem period. A one-hour window is ideal for an acute performance event; a two-hour window works for sustained degradation. Avoid windows wider than four hours — the averages wash out the signal.

You can also generate the report programmatically using DBMS_WORKLOAD_REPOSITORY.AWR_REPORT_HTML, which is useful for automation:

SELECT output FROM TABLE(
  DBMS_WORKLOAD_REPOSITORY.AWR_REPORT_HTML(
    l_dbid    => (SELECT dbid FROM v$database),
    l_inst_num => 1,
    l_bid      => 1500,   -- begin snapshot ID
    l_eid      => 1501    -- end snapshot ID
  )
);

Find your snapshot IDs in DBA_HIST_SNAPSHOT:

SELECT snap_id, begin_interval_time, end_interval_time
FROM   dba_hist_snapshot
WHERE  begin_interval_time >= SYSDATE - 1
ORDER BY snap_id;

Step 1: Read the Report Header — Establish Baseline Context

Before you look at a single number, read the header block. It tells you the database version, instance name, snapshot window duration, and CPUs available. The most important field here is Elapsed Time and DB Time.

  • Elapsed Time: Wall-clock duration of the report window (e.g., 60.5 minutes).
  • DB Time: Total time all sessions spent doing work inside the database (CPU + wait time). Expressed in minutes.

The ratio of DB Time to Elapsed Time tells you the concurrency story. On an 8-CPU system, if Elapsed Time is 60 minutes, the theoretical maximum DB Time is 480 minutes (8 × 60). If DB Time is 1,200 minutes, the database is severely overloaded. If DB Time is 12 minutes, the system is nearly idle. Most healthy production OLTP databases run 1x–4x their CPU count in DB Time during peak periods.

Step 2: Load Profile — The Vital Signs

The Load Profile section is your database's vital signs. It shows per-second and per-transaction rates for the most important activity metrics. This is where Oracle AWR report analysis begins for any serious diagnosis.

Key metrics to check:

  • DB Time (s)/s: DB time per second. Compare against CPU count. If this exceeds your CPU count, you have CPU pressure or a dominant wait event.
  • Logical Reads/s: Buffer gets per second. Sudden spikes here often indicate a new SQL plan or missing index.
  • Physical Reads/s: I/O reads from disk. Healthy systems have a high buffer cache hit ratio (>95%) and low physical reads relative to logical reads.
  • Parses/s and Hard Parses/s: Hard parses above ~10–20/s in an OLTP system are a red flag. Hard parsing is expensive and often indicates cursor sharing problems or missing bind variables.
  • Redo Size/s: High redo generation often correlates with batch jobs or heavy DML. Unexpectedly high redo can indicate row chaining or inefficient UPDATE patterns.
  • Logons/s: Healthy applications use connection pools. A logons/s rate above 5–10 suggests connection pooling is not working properly, which creates hard parse overhead.

When I review a Load Profile, I first check whether the numbers are consistent with what I know about the workload. A batch reporting database should show high logical reads and low logons/s. An OLTP system should show high logons through a pool and low hard parses. If the numbers don't match the workload profile, that mismatch is itself diagnostic.

Step 3: Top 10 Foreground Wait Events — Find the Dominant Bottleneck

This section is where most AWR performance tuning investigations start bearing fruit. It lists the events where sessions spent the most time waiting, expressed as a percentage of DB Time.

The cardinal rule: ignore any event that is a CPU event ("CPU time" in the list is not a wait — it means sessions were actively running on CPU). For everything else, the top one or two events almost always tell you what the database is constrained by.

Common wait events and their diagnoses:

Wait Event Likely Cause
db file sequential read Single-block I/O (index reads). High counts → missing index or bad plan causing index range scan on a large result set.
db file scattered read Multi-block I/O (full table scans). May be appropriate for analytics; bad for OLTP.
log file sync LGWR writing redo to disk on COMMIT. High waits → I/O subsystem cannot keep up with commit rate. Check redo log placement and consider asynchronous I/O.
buffer busy waits Multiple sessions contending for the same buffer. Common with right-growing indexes (sequence-based primary keys) and high-insert workloads. Consider reverse-key indexes or hash partitioning.
enq: TX - row lock contention Application-level row locking. Sessions waiting for another session to release a lock. Check for uncommitted transactions or missing commits in application code.
latch: shared pool Hard parse contention in the shared pool. Often caused by literal SQL (no bind variables) generating massive cursor proliferation.

When a single event consumes more than 30–40% of DB Time, fix that one thing before looking at anything else. The rest of the report is secondary context.

Step 4: SQL Statistics — Identify the Offending Queries

The SQL sections are where you translate the wait event story into specific SQL statements you can tune. There are multiple SQL subsections in an AWR report. For Oracle AWR report step by step analysis, read them in this priority order:

  1. SQL ordered by Elapsed Time: The total time spent executing each SQL statement over the window. Start here. The top statement by elapsed time is usually your primary tuning target.
  2. SQL ordered by CPU Time: Identifies CPU-hungry statements. Compare with elapsed time — if CPU time ≈ elapsed time, the statement isn't waiting, it's just doing a lot of work (usually a full scan).
  3. SQL ordered by Gets: Buffer gets. High-gets statements are reading a lot of data from the buffer cache — often a full table scan that fits in memory, or a poorly indexed join.
  4. SQL ordered by Reads: Physical I/O. High here means the statement is not finding its data in the buffer cache. Buffer cache tuning or query tuning needed.
  5. SQL ordered by Parse Calls: Statements being re-parsed frequently. If a statement appears here but not in the elapsed time list, the parsing overhead itself is the problem (soft or hard parses).
  6. SQL ordered by Executions: High-frequency statements. Even a fast SQL run 10 million times can dominate resource consumption.

For any SQL statement you want to investigate further, grab its SQL_ID from the AWR report and pull the execution plan history:

-- View historical execution plans for a SQL_ID
SELECT * FROM TABLE(
  DBMS_XPLAN.DISPLAY_AWR(
    sql_id      => 'abc123xyz',
    format      => 'ALL +PEEKED_BINDS'
  )
);

This shows you the plan as it was executed during the AWR window, including the bind variable values used at parse time. This is critical when investigating plan instability — the plan in the cursor cache today may not be the plan that caused yesterday's performance event.

For more depth on SQL plan history and how to lock down a stable plan, read our guide on Oracle SQL Plan Baselines and performance stability.

Step 5: Instance Activity Statistics — Validate Your Diagnosis

After forming a hypothesis from the wait events and SQL sections, use the Instance Activity Statistics to validate it. This section contains raw counters for hundreds of Oracle internal metrics. The ones I check most often:

  • consistent gets and db block gets: Make up the logical read count. Cross-reference with the SQL ordered by Gets section.
  • physical reads direct: Reads that bypass the buffer cache (direct path reads, common with parallel query and Exadata). If this is high and unexpected, a parallel query or large sort is likely involved.
  • redo size: Total redo generated. Divide by the number of user commits to get redo per transaction — useful for detecting oversized transactions.
  • parse count (hard): Should be very low in a properly coded OLTP application. More than a few hundred hard parses per hour in a healthy system is unusual.
  • table scans (long tables): Full scans of "large" tables (by Oracle's internal definition). Each one is worth investigating in an OLTP context.

Step 6: Segment Statistics — Pinpoint Hot Objects

The Segments by Physical Reads and Segments by Logical Reads sections tell you which specific tables and indexes are being hammered. This is enormously useful when the SQL section shows many different statements causing I/O — the segment statistics consolidate them into the underlying objects.

-- Independently verify segment activity since last AWR snap
SELECT object_name, object_type, value
FROM   v$segment_statistics
WHERE  statistic_name = 'physical reads'
  AND  value > 0
ORDER BY value DESC
FETCH FIRST 20 ROWS ONLY;

If a specific index appears at the top of physical reads and the corresponding table appears in logical reads, the pattern tells you the optimizer is using an index to read the table row-by-row when a different access path (partition elimination, a covering index, or — counterintuitively — a full scan with parallel) might perform better.

Step 7: Memory Statistics and SGA Breakdown

Near the end of the report, the memory sections confirm whether Oracle's memory allocation is appropriate for the workload. Check:

  • Buffer Cache Hit %: Below 95% in an OLTP system warrants investigation. Below 90% is a clear problem. But note: this metric is misleading if parallel query or direct path reads are common, since those bypass the cache and artificially deflate the ratio.
  • Shared Pool Free Memory: Consistently under 10% of shared pool size indicates the shared pool is undersized or being polluted by literal SQL. Check the V$SQLAREA for statements with PARSE_CALLS = EXECUTIONS and no bind variables.
  • PGA Aggregate Target vs. actual PGA used: If "% of workarea executions - optimal" is below 90%, sort and hash join operations are spilling to temp tablespace. Increase PGA_AGGREGATE_TARGET or tune the sort-heavy SQL.
-- Check PGA usage and spill rates
SELECT name, value
FROM   v$pgastat
WHERE  name IN (
  'aggregate PGA target parameter',
  'aggregate PGA auto target',
  'total PGA inuse',
  'total PGA allocated',
  'over allocation count'
);

Common Pitfalls When Reading AWR Reports

After reviewing thousands of AWR reports, these are the mistakes I see most often:

  • Analyzing the wrong snapshot window: If the incident lasted 15 minutes, a 2-hour snapshot window dilutes the signal to near zero. Always generate the AWR over the tightest window that covers the problem period. If snapshots are every 60 minutes, create a manual snapshot at the start and end of the problem: EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
  • Chasing second-place wait events: If event #1 consumes 70% of DB Time, fixing event #2 (which is 8% of DB Time) will produce no measurable improvement. Fix the dominant event first.
  • Ignoring the DB Time math: A wait event that totals 30 minutes in a 60-minute AWR window with 8 CPUs might look alarming — but if DB Time is only 45 minutes total, that event is the entire workload, not a bottleneck.
  • Misreading "CPU time" as a wait: CPU time in the foreground wait events list is a resource, not a wait. High CPU time means sessions are actively working — which is good, unless CPU is saturated.
  • Comparing AWR reports from different workload periods: An AWR from 2 AM is not comparable to one from 11 AM. Always compare like-for-like (same day of week, same hour) when baselining.

Oracle 19c-Specific AWR Enhancements

Oracle 19c introduced several improvements to the AWR that are worth knowing:

  • Automatic Index AWR integration: When Automatic Indexing is enabled, the AWR tracks index candidates evaluated and created. Look for the "Automatic Indexing" section in the report to see what the optimizer considered during your window.
  • Real-Time Statistics: Oracle 19c can update optimizer statistics in real time during DML. AWR in 19c includes statistics gathering activity in the workload profile, so you can distinguish statistics-gathering overhead from application DML.
  • High-Frequency AWR Snapshots: You can now configure snapshot intervals as low as 10 minutes using DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(interval => 10) without a licensing concern (this was previously a grey area in 12c). Useful for catching short-lived incidents.
  • AWR Warehouse: In Oracle 19c Enterprise, AWR data can be shipped to a central AWR Warehouse database for long-term retention and cross-database comparison — essential for multi-instance environments.

Conclusion

Reading an Oracle AWR report step by step is a skill that compounds with practice. The first few times it feels like staring at a phone book — hundreds of sections, thousands of numbers. But the framework is consistent: establish DB Time context, find the dominant wait event, map it to specific SQL, validate with segment and instance activity statistics. After a dozen investigations you start seeing the patterns immediately.

The AWR report is not a magic answer machine — it is evidence. Your job is to form a hypothesis from the wait events, find the SQL that supports it, reproduce the behaviour if possible, and then test a fix in a non-production environment before touching production. That discipline is what separates a DBA who truly understands Oracle performance from one who is guessing.

If you found this walkthrough useful, drop a comment below — especially if you've hit a wait event pattern I didn't cover here. I read every comment and use them to decide what to write next. You can also follow the blog at oratab19c.blogspot.com for hands-on Oracle 19c guides published twice a week.

No comments:

Post a Comment