Skip to main content
Delphix

TB091 Oracle Data Blocks May Be Corrupted with dNFS

 

 

Alert Type

Data Corruption

Impact

Oracle Virtual Databases (VDBs) running on Delphix use the Network File System (NFS) for storing and accessing Oracle-related files such as tablespace data files, control files, redo logs, etc. In rare circumstances, writes to Oracle 19 database files using NFS with dNFS may cause corrupted database blocks. Over time, multiple blocks and multiple databases could be affected. 

Subsequent to data block corruption, Oracle database queries, DML commands, or DDL commands may fail with an ORA-01578 error. This may disrupt applications using affected databases and could result in lost data. 

Contributing Factors

The following table provides the versions of the Delphix engine which exhibit the issue:

Major Release All Sub Releases
6.0 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0

The issue only occurs with VDBs running Oracle Database Release 19 versions. Delphix Engines deployed on ESX, Google Cloud (GCP), and Oracle Cloud (OCI) are impacted. Note that Delphix Engines deployed on AWS and Azure are not impacted. 

Versions later than Oracle Database Release 19 have not been tested and are not yet supported with Delphix.

The issue can only occur when using the Direct NFS (dNFS) feature with Oracle Database (RDBMS) software.

  • Both NFSv3 and NFSv4 versions of NFS are susceptible when used with dNFS

The issue is more likely to occur on Oracle 19 VDBs with a write-intensive workload. 

 

Symptoms

Affected VDB alert logs may contain one or more messages similar to the following:

Hex dump of (file <file number>, block <block number>) in trace file /oracle/admin/product/19c/diag/rdbms/dbname/DNNAME/trace/DBNAME_j000_28697.trcCorrupt block relative dba: 0x9e0ad97c (file <file number>, block <block number>)
Fractured block found during user buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x9e0ad97c
last change scn: 0x0000.0dc8.e39dfdf1 seq: 0x2 flg: 0x06
spare3: 0x0
consistency value in tail: 0x0018000a
check value in block header: 0x91e5
computed block checksum: 0x32f6

or

Errors in file /oracle/admin/product/19c/diag/rdbms/dbname/DBNAME/trace/DBNAME_j001_2178552.trc
ORA-01578: ORACLE data block corrupted (file # <file number>, block #<block number>)
ORA-01110: data file <file number>: '<file path name>'

One or more Oracle DBWR trace files may be generated containing errors similar to the following:

[97398168994] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839973 ch->order 21842199 ch 0x68651098
[97398169061] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839974 ch->order 21842199 ch 0x68651098
[97398169069] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839975 ch->order 21842199 ch 0x68651098
[97398169076] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839976 ch->order 21842199 ch 0x68651098
[97398169083] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839977 ch->order 21842199 ch 0x68651098
[97398169090] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839978 ch->order 21842199 ch 0x68651098
[97398169097] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839979 ch->order 21842199 ch 0x68651098
[97398169104] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839980 ch->order 21842199 ch 0x68651098
[97398169111] kgnfs_flushmsg: CH OUT of ORDER SEND m->order 21839981 ch->order 21842199 ch 0x68651098

Running the Oracle dbv command can identify one or more corrupted data blocks, for example:

$ dbv userid=... file=<file name> blocksize=8192

DBVERIFY: Release 19.0.0.0.0 - Production on Fri, Apr 30 11:21:52 2021

Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved. 

DBVERIFY - Verification starting : FILE = <file name>
Page <block number> is influx - most likely media corrupt
Corrupt block relative dba: 0xa628d086 (file <file number>, block <block number)
Fractured block found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0xa628d086
last change scn: 0x0000.0332.8d105da7 seq: 0x2 flg: 0x00
spare3: 0x0
consistency value in tail: 0x78124205
check value in block header: 0x0
block checksum disabled

...

DBVERIFY - Verification complete

...

Relief/Workaround

Disable use of the dNFS feature for susceptible Oracle 19 VDBs. 

Resolution

The issue is fully resolved in Delphix software release 6.0.8.0. 

Oracle bug/patch 32931941 can also be applied to resolve the issue.

Additional Information

When running Oracle and using NFS for storage of Oracle Database files, there are two options for the NFS client that will be used by Oracle.  One is to use the host operating system's (for example, Linux, AIX, or Oracle Solaris) NFS client.  The other is to use the NFS client built into the Oracle database itself. This built-in client is referred to as Direct NFS (dNFS). 

The performance of dNFS can surpass the performance of the host operating system's NFS client in many circumstances, depending on the specific OS platform and database workload. The dNFS client can utilize more TCP/IP network connections when communicating with the NFS server, potentially enabling a higher degree of parallelization. 

The issue described in this bulletin results due to an interoperability issue between the Oracle dNFS client and the NFS server in Delphix. The issue occurs when the Delphix NFS server responds to an NFS write request with a message to retry the operation later. When the dNFS client retries the operation, it clobbers a different in-progress write request. This leads to the block corruption. 

How to Tell if an Oracle Database is Configured with dNFS

Examining the alert log for a VDB will show a message similar to:

Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 6.0

It's also possible to query dNFS-related views in running VDBs, for example:

SQL> select count(*) from v$dnfs_files;
COUNT(*)
--------
    1013
SQL> select count(*) from v$dnfs_servers;
COUNT(*)
--------
       2

The presence of rows in the dNFS-related views shown above demonstrates that dNFS is actively in use.