Sunday, August 12, 2007

Automatic Diagnostic Repository (ADR)

Automatic Diagnostic Repository (ADR)

Virag Sharma virag123@gmail.com

In Oracle 11g database New dimension fault diagnosability infrastructure added to Oracle self managing concept to reduce time for resolving problem/bug/SR and round trip between end-user and Oracle Support. Trace file, dump file , core file ADR, ADRCI etc are components of fault diagnosability infrastructure.

In early version of oracle, When ever critical error such as code bug , data corruption occur end user need to collect diagnostic data such as process dump , data structure dump etc. In oracle 11g database end user need to worry about what kind of dump need to collect , need not to search trace file which need to send to Oracle Support Oracle 11g Database proactively collect ,tagged and store diagnostic data in file based repository known as ADR. 11g used file based repository because even your database is down, you have diagnostic data to send to oracle support.

Key Components for Fault Diagnosability Infrastructure

  • ADR
  • ADRCI ( ADR Command line utility )
  • Alert log
  • Trace ,Dumps etc.

ADR (Automatic Diagnostic Repository)

ADR is file based repository for diagn

ostic data like trace file,process dump,data structure dump etc.
In oracle 11g trace. alert

not saved in *_DUMP_DEST directory even you set those parameters in init.ora.11g ignore *_DUMP_DEST and store data in new format , directory structure is given below

Diag/product_type/database_name/instance_name

ADR_base/diag/rdbms/orcl2/orcl2


Click on figure for clear view

Note : ADR_HOME is user define variable , I have define this variable make life easier

ADR root where ADR directory structure start.11g New initialize parameter DIAGNOSTIC_DEST decide location of ADR root,

  • By default location of DIAGNOSTIC_DEST is $ORACLE_HOME/log,
  • if ORACLE_BASE is set in environment then DIAGNOSTIC_DEST is set to $ORACLE_BASE

In 11g alert file is saved in 2 location, one is in alert directory ( in XML format) and old style alert file in trace directory. Within ADR base, there can be many ADR homes, where each ADR home is the root directory for all diagnostic data for a particular instance. The location of an ADR home for a database is shown on the above graphic.

Note :- I have created on environment variable ADR_HOME= . I am using same in all my this document

SQL> show parameter diag
NAME TYPE VALUE
------------------------ ---------------------- ------------------------------
diagnostic_dest string /u01/app/oracle

Data

Old location

ADR location

Core Dump

CORE_DUMP_DEST

$ADR_HOME/cdump

Alert log data

BACKGROUND_DUMP_DEST

$ADR_HOME/trace

$ADR_HOME/alert (XML)

Background process trace

BACKGROUND_DUMP_DEST

$ADR_HOME/trace

User process trace

USER_DUMP_DEST

$ADR_HOME/trace



SQL> desc v$diag_info
Name Null? Type
----------------------------------------- -------- ----------------------------
INST_ID NUMBER
NAME VARCHAR2(64)
VALUE VARCHAR2(512)


SQL> select * from v$diag_info;

INST_ID NAME VALUE
---------- ------------------------- ---------------------------------------------
1 Diag Enabled TRUE
1 ADR Base /u01/app/oracle
1 ADR Home /u01/app/oracle/diag/rdbms/orcl2/orcl2
1 Diag Trace /u01/app/oracle/diag/rdbms/orcl2/orcl2/trace
1 Diag Alert /u01/app/oracle/diag/rdbms/orcl2/orcl2/alert
1 Diag Incident /u01/app/oracle/diag/rdbms/orcl2/orcl2/incident
1 Diag Cdump /u01/app/oracle/diag/rdbms/orcl2/orcl2/cdump
1 Health Monitor /u01/app/oracle/diag/rdbms/orcl2/orcl2/hm
1 Default TraceFile /u01/app/oracle/diag/rdbms/orcl2/orcl2/trace/orcl2_ora_14385.trc
1 Active Problem Count 0
1 Active Incident Count 0


Retention policy

There is retention policy for ADR that allow to specify how long to keep the data
ADR incidents are controlled by two different policies:

  • The incident metadata retention policy ( default is 1 year )
  • The incident files and dumps retention policy ( Default is one month)

We can change retention policy using “adrci” MMON purge data automatically on expired ADR data.

adrci> show control

ADR Home = /u01/app/oracle *************************************************************************

ADRID SHORTP_POLICY LONGP_POLICY LAST_MOD_TIME LAST_AUTOPRG_TIME LAST_MANUPRG_TIME ADRDIR_VERSION ADRSCHM_VERSION ADRSCHMV_SUMMARY ADRALERT_VERSION CREATE_TIME

-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ----------------------------------------

3667832353 720 8760 2008-07-02 13:24:01.088681 -07:00 2008-07-22 00:20:04.113441 -07:00 1 2 0 1 2008-07-02 13:24:01.088681 -07:00

1 rows fetched

adrci>

Change Retention

adrci> set control (SHORTP_POLICY = 360 )
adrci>
set control (LONGP_POLICY = 4380 )

Automatic Diagnostic Repository (ADRCI)

Oracle 11g introduce new tool/utility called ADRCI known as ADR command line tool. This tool
allow user to interact with ADR ,check alert log, check health monitor(HM) status , create report
on HM, Package incident and problem information into a zip file for send to Oracle Support. etc.
No username/password need to log in to ADRCI, ADRCI interact with file system and ADR data
is secured only by operating system permissions on the ADR directories.
[Click Here for Oracle Documenation on adrci]

[oracle@apps001 ~] $ adrci
adrci> help

HELP [topic]
Available Topics:
CREATE REPORT
ECHO
EXIT
HELP
HOST
IPS
PURGE
RUN
SET BASE
SET BROWSER
SET CONTROL
SET ECHO
SET EDITOR
SET HOMES | HOME | HOMEPATH
SET TERMOUT
SHOW ALERT
SHOW BASE
SHOW CONTROL
SHOW HM_RUN
SHOW HOMES | HOME | HOMEPATH
SHOW INCDIR
SHOW INCIDENT
SHOW PROBLEM
SHOW REPORT
SHOW TRACEFILE
SPOOL

There are other commands intended to be used directly by Oracle, type
"HELP EXTENDED" to see the list

adrci>

adrci> SHOW INCIDENT
ADR Home = /u01/app/oracle/diag/rdbms/orcl2/orcl2:
********************************************************************
INCIDENT_ID PROBLEM_KEY CREATE_TIME
--------------------
-----------------------------------------------------------
----------------------------------------
9817 ORA 600 [kcidr_reeval_3] 2008-05-14 18:41:03.609077
+05:30
1 incident info records fetched

Checking alert log with ADRCI

One can see alert log content with the help of ADRCI

adrci>>show alert

If just want to tail content of alert log

$adrci
adrci> set editor vi
adrci> show alert ( it will open alert in vi editor )
adrci> show alert -tail ( Similar to Unix tail command )
adrci> show alert -tail 200 ( Similar to Unix Command tail -200 )
adrci> show alert -tail -f ( Similar to Unix command tail -f )

Content of xml alert log ( log.xml )


type='UNKNOWN' level='16' host_id='apps001'
host_addr='192.20.182.113' module='' pid='19541'>
Incremental checkpoint up to RBA [0x10e.98cbe.0], current log tail at RBA [0x10e.98cea.0]

Since alert log saved as XML format ( log.xml ) , you can query xml file as well

Below is example to check all “ORA-“ in alert log

adrci>SHOW ALERT -P "MESSAGE_TEXT LIKE '%ORA-%'"

You can spool output for ADRCI using spool command same as we use in sqlplus

Problem and Incident

Problem

AD introduce new concept of problem and incident, problem is critical error in database and in ADR problem is identified by problem key. Problem key is consist of oracle error number, error parameter value etc
for example ORA600kci

Incident

Incident is single occurrence of problem , each incident is identified by unique number called incident id ,
which is unique in ADR home, all incident data stored in ADR. Each incident has a problem key and is
mapped to a single problem. When error occurred backup ground process make entry in alert.log and
collect data about incident (like process dump, data structure dump etc)

If similar incident happen more frequently , oracle will not collect data for all incident
By default only five dumps per hour for a given problem are allowed for single given problem and
this call flood control in 11g , some time you see "flood control" messages in alert.log / log.xml.
Incident can be created as manual as well, if needed.

adrci>>SHOW INCIDENT

ADR Home = /u01/app/oracle/diag/rdbms/orcl2/orcl2:
*************************************************************************
INCIDENT_ID PROBLEM_KEY CREATE_TIME
--------------------
-----------------------------------------------------------
----------------------------------------
9817 ORA 600 [kcidr_reeval_3] 2008-05-14 18:41:03.609077 +05:30
1 incident info records fetched

IPS ( Incident package service )

DBA need not search trace , dump etc related particular error, to sent it to oracle support. In ADR
diagnostic data are tagged with incident id and IPS identified trace and dump for particular incident
and allow end user to create package from ADR to send to Oracle Support. Using IPS end user can
add some more file to package if needed.

How to create package

  • Create logical package
  • Add files to package
  • Create zip file to send to oracle support

Create logical package

Check incidents for which you want to create package

adrci>SHOW INCIDENT

ADR Home = /u01/app/oracle/diag/rdbms/orcl2/orcl2:
*************************************************************************
INCIDENT_ID PROBLEM_KEY CREATE_TIME
--------------------
-----------------------------------------------------------
----------------------------------------
9817 ORA 600 [kcidr_reeval_3] 2008-08-14 18:41:03.609077
+05:30
1 incident info records fetched



We can use IPS CREATE PACKAGE command to create a logical package for above incident

adrci>ips create package incident 9817
Created package 4 based on incident id 9817, correlation level typical

Add files to package

Adding diagnostic information for a particular (9817) incident

adrci>ips create package incident 9817
Created package 4 based on incident id 9817, correlation level typical
adrci>ips add incident 9817 package 4
Added incident 9817 to package 4

You can add additional files if needed, But file should be in ADR, below in example we adding alert log to package.

adrci> ips add file
/u01/app/oracle/diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log package 4
Added file /u01/app/oracle/diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log to
package 4 4

Create zip file to send to oracle support

adrci> ips generate package 4 in /tmp
Generated package 4 in file /tmp/ORA600kci_20080814184516_COM_1.zip, mode
complete
adrci>

[root@apps001 tmp]#ls –l /tmp/*.zip

-rw-r--r-- 1 oracle oinstall 584410 May 14 18:50 ORA600kci_20080814184516_COM_1.zip

[root@apps001 tmp]# unzip -l ORA600kci_20080814184516_COM_1.zip
Archive: ORA600kci_20080814184516_COM_1.zip
Length Date Time Name
-------- ---- ---- ----
763210 05-14-08 18:41
diag/rdbms/orcl2/orcl2/incident/incdir_9817/orcl2_ora_5967_i9817.trm
2302506 05-14-08 18:41
diag/rdbms/orcl2/orcl2/incident/incdir_9817/orcl2_ora_5967_i9817.trc
186887 05-14-08 18:41 diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log
491982 05-14-08 18:41 diag/rdbms/orcl2/orcl2/alert/log.xml
1122 05-14-08 18:41 diag/rdbms/orcl2/orcl2/trace/orcl2_diag_5931.trc
189 05-14-08 18:41 diag/rdbms/orcl2/orcl2/trace/orcl2_diag_5931.trm
1342 05-14-08 18:41 diag/rdbms/orcl2/orcl2/trace/orcl2_ora_5967.trc
773 05-14-08 18:41 diag/rdbms/orcl2/orcl2/trace/orcl2_ora_5967.trm
831 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/IPS_CONFIGURATION.dmp
338 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/IPS_PACKAGE.dmp
193 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/IPS_PACKAGE_INCIDENT.dmp
1094 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/IPS_PACKAGE_FILE.dmp
234 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/IPS_PACKAGE_HISTORY.dmp
6004 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/IPS_FILE_METADATA.dmp
214 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/IPS_FILE_COPY_LOG.dmp
1273 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/DDE_USER_ACTION_DEF.dmp
1813 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/DDE_USER_ACTION_PARAMETER_DEF.dmp
204 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/DDE_USER_ACTION.dmp
198 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/DDE_USER_ACTION_PARAMETER.dmp
353 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/DDE_USER_INCIDENT_TYPE.dmp
163 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/DDE_USER_INCIDENT_ACTION_MAP.dmp
614 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/INCIDENT.dmp
357 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/INCCKEY.dmp
202 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/INCIDENT_FILE.dmp
406 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/PROBLEM.dmp
710 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/HM_RUN.dmp
843 05-14-08 18:49 diag/rdbms/orcl2/orcl2/hm/HMREPORT_HM_RUN_21.hm
708 05-14-08 18:49 diag/rdbms/orcl2/orcl2/hm/HMREPORT_HM_RUN_41.hm
207 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/export/EM_USER_ACTIVITY.dmp
62624 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/config.xml 489 05-14-07 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/metadata.xml
9508 05-14-08 18:49
diag/rdbms/orcl2/orcl2/incpkg/pkg_4/seq_1/manifest_4_1.xml
0 05-05-08 04:00 diag/rdbms/orcl2/orcl2/alert/
0 05-05-08 04:00 diag/rdbms/orcl2/orcl2/cdump/
0 05-14-08 18:49 diag/rdbms/orcl2/orcl2/hm/
0 05-14-08 18:41 diag/rdbms/orcl2/orcl2/incident/
0 05-14-08 18:45 diag/rdbms/orcl2/orcl2/incpkg/
0 05-13-08 22:51 diag/rdbms/orcl2/orcl2/ir/
0 05-14-08 18:41 diag/rdbms/orcl2/orcl2/lck/
0 05-05-08 04:00 diag/rdbms/orcl2/orcl2/metadata/
0 05-14-08 18:41 diag/rdbms/orcl2/orcl2/stage/
0 05-14-08 18:41 diag/rdbms/orcl2/orcl2/sweep/
0 05-14-08 18:41 diag/rdbms/orcl2/orcl2/trace/
489 05-14-08 18:49 metadata.xml
-------- -------
3838080 44 files
[root@apps001 tmp]#

Log a SR and upload this zip file to Oracle Support for diagnose and resolution.

IPS in Summary


$ adrci
adrci> help ips
adrci> show incident
( For example above command show incident No 9817 for ORA-600 [XYZ] )

adrci> ips create package incident 9817
<= ( it will give package No.)
adrci> ips create package incident 9817
Created package 4 based on incident id 9817, correlation level typical

adrci> ips add incident 9817 package 4
Added incident 9817 to package 4

adrci>

adrci>>ips add file
/u01/app/oracle/diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log package 4

Added file /u01/app/oracle/diag/rdbms/orcl2/orcl2/trace/alert_orcl2.log to
package 4

adrci>>ips generate package 4 in /tmp
Generated package 4 in file /tmp/ORA600kci_20080814184516_COM_1.zip, mode
complete
adrci>>



Health Monitor (HM)

Health Monitor run diagnostic checks on various components of the database. Health Monitor checks
examine various components of the database, including files, memory, transaction integrity, metadata,
and process usage. In order to collect more data after critical error (incident) , oracle invoke health
monitoring implicitly. If need end-user can also run health monitoring procedure manually

Health Monitor checks run in two ways:

  • Reactive: The fault diagnosability infrastructure can invoke Health Monitor checks automatically
    in response to critical errors.
  • Manual: DBA can manually run Health Monitor health checks Manually

Kinds of health monitoring

Please look at the V$HM_CHECK view , it will list all Health monitoring checks

SQL> desc v$hm_check
Name Null? Type
--------- -------- -------
ID NUMBER
NAME VARCHAR2(64)
CLSID NUMBER
CLS_NAME VARCHAR2(15)
FLAGS NUMBER
INTERNAL_CHECK VARCHAR2(1)
OFFLINE_CAPABLE VARCHAR2(1)
DESCRIPTION VARCHAR2(64)

SQL> select name from v$hm_check ;

NAME
-------------------------
HM Test Check
Database Cross Check
Data Block Check
Redo Check
Logical Block Check
Table Check
Table-Index Cross Check
Table Row Check
Table-Index Row Mismatch
Transaction Check
Undo Segment Check
All Control Files Check
CF Member Check
All Datafiles Check
Single Datafile Check
Log Group Check
Log Group Member Check
Archived Log Check
Redo Revalidation Check
IO Revalidation Check
Block IO Revalidation Check
Txn Revalidation Check
Failure Simulation Check
Database Dictionary Check

25 rows selected.

High Lighted HM , we going to run Manually in next step

Health monitoring report

The checker generates a report of its execution in XML and stores the reports in ADR.
You can view these reports using either V$HM_RUN, DBMS_HM, ADRCI, or Enterprise Manager.

Run Health Check Manually

SQL> exec dbms_HM.RUN_CHECK('Database Dictionary Check');

PL/SQL procedure successfully completed.

$adrci
adrci>SHOW HM_RUN

ADR Home = /u01/app/oracle/diag/rdbms/orcl2/orcl2:

*************************************************************************

----------------------------------------------------------

RUN_ID 1
RUN_NAME HM_RUN_1
CHECK_NAME Database Cross Check
NAME_ID 2
MODE 2
START_TIME 2008-08-05 04:01:56.783059 +05:30
RESUME_TIME
END_TIME 2008-08-08 04:02:04.007178 +05:30
MODIFIED_TIME 2008-08-08 04:02:04.007178 +05:30
TIMEOUT 0
FLAGS 0
STATUS 5
SRC_INCIDENT_ID 0
NUM_INCIDENTS 0
ERR_NUMBER 0
REPORT_FILE

----------------------------------------------------------

RUN_ID 21

RUN_NAME HM_RUN_21

CHECK_NAME Database Dictionary Check
NAME_ID 24
MODE 0
START_TIME 2008-08-13 23:09:43.831573 +05:30
RESUME_TIME
END_TIME 2008-08-13 23:09:47.713191 +05:30
MODIFIED_TIME 2008-08-14 00:03:01.470031 +05:30
TIMEOUT 0
FLAGS 0
STATUS 5
SRC_INCIDENT_ID 0
NUM_INCIDENTS 0
ERR_NUMBER 0
REPORT_FILE /u01/app/oracle/diag/rdbms/orcl2/orcl2/hm/HMREPORT_HM_RUN_21.hm

2 rows fetched

Create HM Report

adrci>>CREATE REPORT HM_RUN HM_RUN_21

You can create and view Health Monitor checker reports using the ADRCI utility. Make sure that Oracle environment variables are set properly, The ADRCI utility starts and displays its prompt as shown above.
You then enter the SHOW HM_RUN command to list all the checker runs registered in the ADR repository. Locate the checker run for which you want to create a report and note the checker run name using the corresponding RUN_NAME field. you can generate the report using the CREATE REPORT HM_RUN command. You view the report using the SHOW REPORT HM_RUN command or by running dbms_hm.get_run_report on sql prompt

View HM generated reports on SQL prompt

SQL> set long 1000000
SQL> select dbms_hm.get_run_report('HM_RUN_21') from dual;

DBMS_HM.GET_RUN_REPORT('HM_RUN_21')

--------------------------------------------------------------------------------
................

View HM generated reports on adrci prompt

adrci>>show report hm_run HM_RUN_21

View HM generated reports on OS level ( In ADR repository )

$cd $ADR_HOME/hm
$ pwd
/u01/app/oracle/diag/rdbms/orcl2/orcl2/hm

$ ls

HMREPORT_HM_RUN_21.hm

$ more HMREPORT_HM_RUN_21.hm

Conclusion

That was just a quick glance at some of the new features in Oracle Database 11g Release 1. Now I'm off to read the documentation in full so I can take advantage of all the new stuff.

2 comments:

  1. Hi Virag,

    I am not sure which version you are on but 11GR1 fails with

    SQL> exec dbms_HM.RUN_CHECK('Database Dictionary Check');
    BEGIN dbms_HM.RUN_CHECK('Database Dictionary Check'); END;

    *
    ERROR at line 1:
    ORA-51001: check [Database Dictionary Check] not found in HM catalog
    ORA-06512: at "SYS.DBMS_HM", line 191
    ORA-06512: at line 1


    but works with

    SQL> exec dbms_HM.RUN_CHECK('DB Structure Integrity Check');

    PL/SQL procedure successfully completed.

    ReplyDelete
  2. sorry the working one is

    SQL> exec dbms_HM.RUN_CHECK(' Dictionary Integrity Check');

    PL/SQL procedure successfully completed.

    ReplyDelete

Google