Skip to main content
Delphix

Resolving "Stale File Handle" Error on Linux Systems (KBA1037)

 

 

KBA

KBA#1037

How to Resolve Stale File Handle on Linux System

It is sometimes possible, due to network, environmental, or other issues, for an NFS mount to become disconnected from the server which results in stale NFS file handles. This results in hanging commands and errors such as NFS server not respondingor stale file handle reported in various OS command output (df -h, mount, ls)

VDB provision or refresh activities may also fail as a result of stale file handle. The job failure details will include the stale file handle indicator, shown in the example below:

REQUIRED.CURRENT_USER=<Delphix OS user>
#####DELPHIX_END_DATA#####
#####DELPHIX_START_ERROR#####
ERROR_CODE=104 ERROR : User "<Delphix OS User>" could not unmount "/mnt/provision/<VDB name>/datafile" 
ERROR : Details : umount.nfs: /mnt/provision/<VDB name>/datafile: Stale file handle; 
#####DELPHIX_END_ERROR#####

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
ALL ALL

Resolution

Due to the variance in Linux version and patch releases, and the operating system condition, this issue may be resolved on Linux systems (requires kernel >= 2.4.11) using the following procedure without downtime.

As file system mount/unmount operations are typically protected or otherwise granted to privileged users, the following process should be attempted while logged in as the Delphix OS user, as this user should already have the appropriate permissions for the VDB file systems.  Otherwise the steps can be attempted by root, or any other privileged user.

Note

Note:

If the validation/verification of open files does not correlate with the examples below, you may attempt to umount the file systems by skipping to Step 5.

In the following example, a Dataset named exampleVDB will be used to exercise this process.

  1. Ensure no processes are accessing the mount point(s) by using lsofand searching the stale mount point:

# lsof|grep /mnt/provision/exampleVDB
oracle 2891 oracle 19r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
oracle 2891 oracle 20r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
oracle 2891 oracle 21r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
oracle 2891 oracle 22r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
oracle 2891 oracle 23r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
  1. Check what the PID is doing by doing a ps -ealf|grep <PID>
    In this example we see PID 2891 is accessing this NFS share:

# ps -ealf|grep 2891
0 D oracle 2891 1 0 80 0 - 155699 rpc_wa 04:21 ? 00:00:00 ora_mmnl_exampleVDB
0 S root 4818 3270 0 80 0 - 25814 pipe_w 08:46 pts/0 00:00:00 grep 2891
  1. Check the NFS mounts using mount -l -t nfs | grep <VDB name>
    In this example we still have the active Oracle instance, exampleVDB, accessing this nfs share.
# mount -l -t nfs | grep exampleVDB
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123 on /mnt/provision/exampleVDB type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile on /mnt/provision/exampleVDB/datafile type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/archive on /mnt/provision/exampleVDB/archive type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/external on /mnt/provision/exampleVDB/external type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/temp on /mnt/provision/exampleVDB/temp type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
  1. Next, stop the instance and any other processes accessing the mount point.
    In this example this is done using sqlplus:

[oracle@centos65-tgt ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Tue May 19 01:55:04 2015
Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> shutdown abort;
ORACLE instance shut down.
SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@centos65-tgt ~]$
  • We should now confirm the mount point is no longer busy by repeating the 1.0 step:
[root@centos65-tgt ~]# lsof|grep /mnt/provision/exampleVDB
[root@centos65-tgt ~]#
  1. After confirming no processes are using the mount point, obtain a list of the mount points by using mount -l -t nfs |grep <VDB> | cut -d ' ' -f
# mount -l -t nfs | grep exampleVDB | cut -d ' ' -f 3
/mnt/provision/exampleVDB
/mnt/provision/exampleVDB/datafile
/mnt/provision/exampleVDB/archive
/mnt/provision/exampleVDB/external
/mnt/provision/exampleVDB/temp
  1. Now proceed with the umount using the lazy and force options - umount -lf 
    Note that the mount points are obtained from the mount command in Step 5.
# umount -lf /mnt/provision/exampleVDB
# umount -lf /mnt/provision/exampleVDB/datafile
# umount -lf /mnt/provision/exampleVDB/archive
# umount -lf /mnt/provision/exampleVDB/external
# umount -lf /mnt/provision/exampleVDB/temp

or

# for l in `mount -l -t nfs | grep exampleVDB | cut -d ' ' -f 3`; do umount -lf $l; done
  1. Confirm these mounts are unmounted mount -l -t nfs | grep <VDB>

If this procedure is not successful in resolving the stale file handle, the OS administrator should be engaged to investigate other options, or the host may be rebooted.