Resolving "Stale File Handle" Error on Linux Systems (KBA1037)
KBA
KBA#1037How to Resolve Stale File Handle on Linux System
It is sometimes possible, due to network, environmental, or other issues, for an NFS mount to become disconnected from the server which results in stale NFS file handles. This results in hanging commands and errors such as NFS server not responding, or stale file handle reported in various OS command output (df -h, mount, ls)
VDB provision or refresh activities may also fail as a result of stale file handle. The job failure details will include the stale file handle indicator, shown in the example below:
REQUIRED.CURRENT_USER=<Delphix OS user> #####DELPHIX_END_DATA##### #####DELPHIX_START_ERROR##### ERROR_CODE=104 ERROR : User "<Delphix OS User>" could not unmount "/mnt/provision/<VDB name>/datafile" ERROR : Details : umount.nfs: /mnt/provision/<VDB name>/datafile: Stale file handle; #####DELPHIX_END_ERROR#####
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases ALL ALL
Resolution
Due to the variance in Linux version and patch releases, and the operating system condition, this issue may be resolved on Linux systems (requires kernel >= 2.4.11) using the following procedure without downtime.
As file system mount/unmount operations are typically protected or otherwise granted to privileged users, the following process should be attempted while logged in as the Delphix OS user, as this user should already have the appropriate permissions for the VDB file systems. Otherwise the steps can be attempted by root, or any other privileged user.
In the following example, a Dataset named exampleVDB will be used to exercise this process.
-
Ensure no processes are accessing the mount point(s) by using
lsof
and searching the stale mount point:
# lsof|grep /mnt/provision/exampleVDB oracle 2891 oracle 19r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile) oracle 2891 oracle 20r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile) oracle 2891 oracle 21r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile) oracle 2891 oracle 22r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile) oracle 2891 oracle 23r REG 0,22 68165632 12 /mnt/provision/exampleVDB/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
-
Check what the PID is doing by doing a
ps -ealf|grep <PID>
In this example we see PID 2891 is accessing this NFS share:
# ps -ealf|grep 2891 0 D oracle 2891 1 0 80 0 - 155699 rpc_wa 04:21 ? 00:00:00 ora_mmnl_exampleVDB 0 S root 4818 3270 0 80 0 - 25814 pipe_w 08:46 pts/0 00:00:00 grep 2891
- Check the NFS mounts using
mount -l -t nfs | grep <VDB name>
In this example we still have the active Oracle instance, exampleVDB, accessing this nfs share.
# mount -l -t nfs | grep exampleVDB 192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123 on /mnt/provision/exampleVDB type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131) 192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile on /mnt/provision/exampleVDB/datafile type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131) 192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/archive on /mnt/provision/exampleVDB/archive type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131) 192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/external on /mnt/provision/exampleVDB/external type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131) 192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/temp on /mnt/provision/exampleVDB/temp type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
-
Next, stop the instance and any other processes accessing the mount point.
In this example this is done using sqlplus:
[oracle@centos65-tgt ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.1.0 Production on Tue May 19 01:55:04 2015 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> shutdown abort; ORACLE instance shut down. SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options [oracle@centos65-tgt ~]$
- We should now confirm the mount point is no longer busy by repeating the 1.0 step:
[root@centos65-tgt ~]# lsof|grep /mnt/provision/exampleVDB [root@centos65-tgt ~]#
- After confirming no processes are using the mount point, obtain a list of the mount points by using
mount -l -t nfs |grep <VDB> | cut -d ' ' -f
# mount -l -t nfs | grep exampleVDB | cut -d ' ' -f 3 /mnt/provision/exampleVDB /mnt/provision/exampleVDB/datafile /mnt/provision/exampleVDB/archive /mnt/provision/exampleVDB/external /mnt/provision/exampleVDB/temp
- Now proceed with the umount using the lazy and force options -
umount -lf
Note that the mount points are obtained from the mount command in Step 5.
# umount -lf /mnt/provision/exampleVDB # umount -lf /mnt/provision/exampleVDB/datafile # umount -lf /mnt/provision/exampleVDB/archive # umount -lf /mnt/provision/exampleVDB/external # umount -lf /mnt/provision/exampleVDB/temp
or
# for l in `mount -l -t nfs | grep exampleVDB | cut -d ' ' -f 3`; do umount -lf $l; done
- Confirm these mounts are unmounted
mount -l -t nfs | grep <VDB>
If this procedure is not successful in resolving the stale file handle, the OS administrator should be engaged to investigate other options, or the host may be rebooted.