Collecting Diagnostics from Oracle Hosts for Delphix Support Cases (KBA1763)
KBA
KBA# 1763
This article provides steps for collecting diagnostic information from Oracle hosts to assist with Delphix support cases, including:
- General Server Information
- Alert Log and Trace Files
- Listener Log and Status
- OPatch Information
- Delphix Toolkit Logs
- Oracle Session Details
- Oracle System State Dump
- Process Run State
- Upload Logs to Your Support Case
General Server Information
Create a directory in which all the files can be collected before they get compressed and uploaded to Delphix.
mkdir <case number>
mkdir <case number>
For example:
mkdir /tmp/12345
Copy the system messages file to the temporary directory. For example,
Linux:
cp /var/log/messages /tmp/12345
Solaris:
cp /var/adm/messages /tmp/12345
Alert Log and Trace Files
Find the location of your alert log. For example,
SQL> SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Diag Trace'; VALUE -------------------------------------------------------------------------------- /u01/app11/ora11204/diag/rdbms/rh74db01/rh74db01/trace
Navigate to this directory and copy the alert log file for your database to the temporary directory. For example,
cd /u01/app11/ora11204/diag/rdbms/rh74db01/rh74db01/trace cp alert*.log /tmp/12345
Copy any trace files with your particular error number to the temporary directory. For example,
find . -mtime -1 -type f -name '*.trc' -exec grep -l "ORA-00060" {} \; -exec cp {} /tmp/12345 \;
Listener Log and Status
Find all of the listeners on the server.
ps -ef|grep tnslsnr
For each listener, capture the status and services to the temporary directory. For example,
lsnrctl status LISTENER > /tmp/12345/LISTENER_status lsnrctl service LISTENER > /tmp/12345/LISTENER_service
For each listener, get the listener log directory. For example,
lsnrctl status LISTENER | grep "Log File" | sed "s/alert\/log.xml/trace/g"
For each listener, copy the log to the temporary directory. For example,
cp /u01/app12/ora12201/diag/tnslsnr/rhel74-ora-src/listener/trace/listener*.log /tmp/12345
OPatch information
For the appropriate database Oracle Home:
- Set ORACLE_HOME to your database home. If you are unsure about which ORACLE_HOME it should be, you can check the Configuration tab in the UI, for example
- Set the ORACLE_HOME using whatever method is appropriate for your host, for example
. oraenv ORCL
- Collect the OPatch information and copy it to the temporary directory. For example,
$ORACLE_HOME/OPatch/opatch lsinventory > /tmp/12345/rdbms_opatch.lis
If you have Grid Infrastructure running on this server:
- Set ORACLE_HOME to your Grid Infrastructure home
- Collect the OPatch information and copy it to the temporary directory. For example,
$ORACLE_HOME/OPatch/opatch lsinventory > /tmp/12345/gi_opatch.lis
Delphix Toolkit Logs
Copy the connector and snapsync directories in the toolkit directory to the temporary directory.
The following snapsync directory may not exist in your environment.
For example,
tar cvf /tmp/12345/connector.tar /var/opt/delphix/Toolkit/Delphix_564d736a_5edb_8bf7_f2c1_9be79bc6d2dc_delphix_host/log/connector tar cvf /tmp/12345/snapsync.tar /var/opt/delphix/Toolkit/Delphix_564d736a_5edb_8bf7_f2c1_9be79bc6d2dc_delphix_host/log/snapsync
Oracle Session Details
This is especially important when collecting data for slow or hanging jobs.
- Connect to the Oracle instance ( or CDB instance ) in question as sysdba
- Collect the output of. Ensure the spool output directory is set correctly.
set pagesize 1000 set linesize 400 set long 130 col sid for 99999999999 col event format A40 col username format A10 col program for A10 col osuser for A20 col server for A10 col sql_text for A130 col p1text for A15 col p2text for A15 col p3text for A15 col ospid for A10 spool /tmp/12345/session_detail.out select s.sid, s.serial#, s.username, p.pid as oracle_pid, p.spid as ospid, s.state, sw.event, s.p1text, s.p1, s.p2text, s.p2, s.p3text, s.p3, sw.seconds_in_wait, substr(s.program,1,30) as program, substr(q.sql_text,1,130) as sql_text, q.sql_id from v$session s inner join v$session_wait sw on s.sid = sw.sid inner join v$process p on s.paddr = p.addr left outer join v$sqltext q on q.address = s.sql_address and q.hash_value = s.sql_hash_value and q.PIECE = 0 where status = 'ACTIVE' and s.username is not null order by seconds_in_wait desc; spool off
Oracle System State Dump
This is especially important when collecting data for slow or hanging jobs.
- Connect to the Oracle instance (or CDB instance) in question as sysdba.
- Execute the following commands.
SQL> oradebug setmypid SQL> oradebug tracefile_name SQL> oradebug unlimit SQL> oradebug dump systemstate 267
- Copy the generated trace file ( This will be output from oradebug tracefile_name command ) to diagnostics directory /tmp/12345.
Capturing two system state dumps a minute or so apart may also be required to determine if processes are moving or hung.
Process Run State
This is especially important when collecting data for slow or hanging jobs.
- Collect list of running process
- Identify and collect jstack and pstack for each Delphix client process.
Linux ( as the user owning the process )
ps -ef --forest > /tmp/12345/ps.out ps -ef | grep java | grep Delphix oracle 16846 1 3 22:09 ? 00:00:04 /work/Delphix_COMMON_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/java/jdk/bin/java -ea -XX:-UseVMInterruptibleIO -Ddelphix.host.os=unix -Ddelphix.toolkit.base.dir=/work -Ddelphix.max.worker=16 -Djava.io.tmpdir=/work/Delphix_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/tmp -jar /work/Delphix_COMMON_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/client/dsp/client.jar ^ for each process found, perform the following actions. /work/Delphix_COMMON_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/java/jdk/bin/jstack 16846 > /tmp/12345/jstack.16846.out - If the command above fails, you can try with -F (force) as root or via sudo but it should be used as last resort only: /work/Delphix_COMMON_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/java/jdk/bin/jstack -F 16846 > /tmp/12345/jstack.16846.out pstack 16846 > /tmp/12345/pstack.16846.out gcore -a -o /tmp/12345/core_dump.16846 16846
Solaris ( as the user owning the process )
ptree > /tmp/12345/ptree.out ps auxw | grep java | grep Delphix oracle 16846 1 3 22:09 ? 00:00:04 /work/Delphix_COMMON_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/java/jdk/bin/java -ea -XX:-UseVMInterruptibleIO -Ddelphix.host.os=unix -Ddelphix.toolkit.base.dir=/work -Ddelphix.max.worker=16 -Djava.io.tmpdir=/work/Delphix_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/tmp -jar /work/Delphix_COMMON_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/client/dsp/client.jar ^ for each process found, perform the following actions. /work/Delphix_COMMON_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/java/jdk/bin/jstack 16846 > /tmp/12345/jstack.16846.out - If the command above fails, you can try with -F (force) as root or via pfexec/sudo but it should be used as last resort only: /work/Delphix_COMMON_564d8f2c_a863_3ee5_6144_5fed34066879_oracle_host/java/jdk/bin/jstack -F 16846 > /tmp/12345/jstack.16846.out pstack 16846 > /tmp/12345/pstack.16846.out
AIX (as root or via sudo):
jStack: - Check for active dsp clients: root@aix:/work-> ps -ef | grep client.jar dlpxqaor 13636656 1 0 03:08:06 - 0:33 /work/Delphix_COMMON_448cff503608_6867d856d61b_4630_host/java/jdk/bin/java -ea -XX:-UseVMInterruptibleIO -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/work -Dio.netty.allocator.maxOrder=7 -Djdk.tls.acknowledgeCloseNotify=true -Dcom.ibm.tools.attach.enable=no -javaagent:/work/Delphix_COMMON_448cff503608_6867d856d61b_4630_host/client/dsp/libs/com.delphix.common/agent-1.0.0.jar -Ddelphix.host.os=unix -Ddelphix.toolkit.base.dir=/work -Ddelphix.max.worker=16 -Djava.io.tmpdir=/work/Delphix_448cff503608_6867d856d61b_4630_host/tmp -jar /work/Delphix_COMMON_448cff503608_6867d856d61b_4630_host/client/dsp/client.jar root 34608182 41620696 0 03:35:05 pts/0 0:00 grep client.jar - Record current date/time: root@aix:/work-> date Wed Feb 22 03:35:07 PST 2023 - Send signal 3 to create a jstack like file: root@aix:/work-> kill -3 13636656 - Check for any javacore* files in the specified -XX:HeapDumpPath=/work folder created after the date/time above: root@aix:/work-> ls -la javacore* -rw-r--r-- 1 dlpxqaor oinstall 1177544 Feb 22 03:35 javacore.20230222.033515.13636656.0004.txt <-- jstack file output - You may want to collect more than one jstack to see if threads are changing. Full heap dump: - Generate full heap dump via gencore: root@aix:/work-> gencore 13636656 /work/client.dmp - Check the size of the file (it shouldn't be 0!): root@aix:/work-> ls -la client.dmp -rw-r--r-- 1 dlpxqaor oinstall 757698322 Feb 22 03:39 client.dmp - Generate a zip file of the dump jextract should be available from our toolkit: /work/Delphix_COMMON_448cff503608_6867d856d61b_4630_host/java/jdk/jre/bin/jextract client.dmp - Check the zip file: root@aix:/work-> ls -la client.dmp.zip -rw-r--r-- 1 root system 82645767 Feb 22 03:42 client.dmp.zip What to send: Collect all javacore* and client.dmp.zip files and upload them for analysis.
Uploading Logs To Your Support Case
Compress the temporary directory before uploading it. For example,
tar cvf /tmp/12345.tar /tmp/12345 gzip /tmp/12345.tar
Files of less than 50MB in size can be attached directly to the case, using the link contained in emails regarding your open case. Files larger than this size can be attached using https://upload.delphix.com, referencing your case number when prompted.
How to construct the toolkit directory path
The toolkit directory name is Delphix version dependent, as how it is built changed in 6.0.2.0
pre-6.0.2.0
single instance: /<toolkit_directory>/Delphix_<engineuuid>_<delphix_osuser>_host/log/connector /<toolkit_directory>/Delphix_<engineuuid>_<delphix_osuser>_host/log/snapsync or cluster: /<toolkit_directory>/Delphix_<engineuuid>_<delphix_osuser>_cluster/log/connector /<toolkit_directory>/Delphix_<engineuuid>_<delphix_osuser>_cluster/log/snapsync
this can be pieced together from the following:
cat misc/appliance-uuid toolkit_directory: select dh.toolkit_directory from dlpx_host dh, dlpx_host_environment dhe where dhe.name = <environment_name> and dh.host_id = split_part(dhe.host,'-',2)::int; delphix_osuser: (really only relevant if the environment has multiple users) select deu.user_name from dlpx_environment_user deu, dlpx_host_environment dhe where dhe.name=<environment_name> and dhe.host_environment_id = split_part(deu.environment,'-',2)::int;
Example: for environment name lx_dr_oltp
cat misc/appliance-uuid 42209a14-6072-ccfa-fabf-a6697b523e8a hercules=> select dh.toolkit_directory hercules-> from dlpx_host dh, dlpx_host_environment dhe hercules-> where dhe.name = 'lx_dr_oltp' and hercules-> dh.host_id = split_part(dhe.host,'-',2)::int; toolkit_directory ----------------------- /home/delphix/toolkit (1 row) hercules=> select deu.user_name hercules-> from dlpx_environment_user deu, dlpx_host_environment dhe hercules-> where dhe.name='lx_dr_oltp' and hercules-> dhe.host_environment_id = split_part(deu.environment,'-',2)::int; user_name ----------- delphix giving us: /home/delphix/toolkit/delphix_42209a14_6072_ccfa_fabf_a6697b523e8a_delphix/log
6.0.2.0+
single instance: /<toolkit_directory>/Delphix_<engineuuidstub>_<serveruuidstub>_host-user-id_host/log/connector /<toolkit_directory>/Delphix_<engineuuidstub>_<serveruuidstub>_host-user-id_host/log/snapsync or cluster: /<toolkit_directory>/Delphix_<engineuuidstub>_<serveruuidstub>_host-user-id_cluster/log/connector /<toolkit_directory>/Delphix_<engineuuidstub>_<serveruuidstub>_host-user-id_cluster/log/snapsync
this can be pieced together from the following:
engineuuidstub: cut -d'-' -f5 < misc/appliance-uuid example output: cat misc/appliance-uuid | cut -d'-' -f5 4c3487c97f3c
for single instance environments, this query will provide the relevant information:
select dh.host_address, dh.toolkit_directory, split_part(dh.delphix_uuid,'-',5) as uuidstub, deu.host_user_id, deu.user_name from dlpx_host dh, dlpx_host_environment dhe, dlpx_environment_user deu where dhe.name=<environment_name> and deu.environment ~ 'ENVIRONMENT' and dhe.host_environment_id = split_part(deu.environment,'-',2)::int and dh.host_id = split_part(dhe.host,'-',2)::int;
example output:
host_address | toolkit_directory | uuidstub | host_user_id | user_name ------------------------------------------+-------------------+--------------+--------------+----- jenky-ora-tgt.dcol2.delphix.com | /var/tmp | 5bdbb773a03e | 4 | oracle jenky-ora-tgt.dcol2.delphix.com | /var/tmp | 5bdbb773a03e | 5 | davidj the toolkit directory would then be: /var/tmp/Delphix_4c3487c97f3c_5bdbb773a03e_4_host (for environment user oracle) /var/tmp/Delphix_4c3487c97f3c_5bdbb773a03e_5_host (for environment user davidj)
For a cluster environment, this query will provide the relevant information:
select ocn.name as node_name, dh.host_address, dh.toolkit_directory, split_part(dh.delphix_uuid,'-',5) as uuidstub, deu.host_user_id, deu.user_name as user from dlpx_host dh, dlpx_environment_user deu, orcl_cluster oc, orcl_cluster_node ocn where oc.name=<cluster_environment_name> and deu.environment ~ 'CLUSTER' and oc.cluster_id = split_part(ocn.cluster,'-',2)::int and oc.cluster_id = split_part(deu.environment,'-',2)::int and dh.host_id = split_part(ocn.host,'-',2)::int;
example output:
node_name | host_address | toolkit_directory | uuidstub | host_user_id | user -----------+--------------------------+-------------------+--------------+--------------+--------- sdrac1 | sdrac1.dcol1.delphix.com | /work | 5e10982a4965 | 1 | oracle sdrac2 | 10.43.76.58 | /work | c222acd463bb | 1 | oracle the toolkit directories would then be: node sdrac1, host_address sdrac1.dcol1.delphix.com /work/Delphix_4c3487c97f3c_5e10982a4965_1_cluster (for environment user oracle) node sdrac2, host_address 10.43.76.58 /work/Delphix_4c3487c97f3c_c222acd463bb_1_cluster (for environment user oracle)