Skip to main content
Delphix

Resolving "stale file handle" on Linux systems

Problem

It is sometimes possible, due to network, environmental, or other issues, for an NFS mount to become disconnected from the server which results in stale NFS file handles. This results in hanging commands and errors such as NFS server not respondingor stale file handle reported in various OS command output (df -h, mount, ls)

Resolution

Due to the variance in Linux version and patch releases, and the operating system condition, this issue may be resolved on Linux systems (requires kernel >= 2.4.11) using the following procedure without downtime.  Note, if the validation/verification of open files does not correlate with the examples below, you may still attempt to umount the filesystems by skipping to step 5.

1. Ensure no processes are accessing the mount point(s) by using lsof and searching the stale mount point:

# lsof|grep /mnt/provision/Vorc8BA
oracle 2891 oracle 19r REG 0,22 68165632 12 /mnt/provision/Vorc8BA/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
oracle 2891 oracle 20r REG 0,22 68165632 12 /mnt/provision/Vorc8BA/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
oracle 2891 oracle 21r REG 0,22 68165632 12 /mnt/provision/Vorc8BA/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
oracle 2891 oracle 22r REG 0,22 68165632 12 /mnt/provision/Vorc8BA/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)
oracle 2891 oracle 23r REG 0,22 68165632 12 /mnt/provision/Vorc8BA/datafile/u01/app/oracle/oradata/orcl/undotbs01.dbf (192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile)

2. Check what the PID is doing by doing a ps -ealf|grep <PID>

In this example we see PID 2891 is accessing this NFS share:

# ps -ealf|grep 2891
0 D oracle 2891 1 0 80 0 - 155699 rpc_wa 04:21 ? 00:00:00 ora_mmnl_Vorc8BA
0 S root 4818 3270 0 80 0 - 25814 pipe_w 08:46 pts/0 00:00:00 grep 2891

3. Check the NFS mounts using mount -l -t nfs | grep <VDB name>

In this example we still have the active Oracle instance, Vorc8BA, accessing this nfs share. 

# mount -l -t nfs | grep Vorc8BA
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123 on /mnt/provision/Vorc8BA type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/datafile on /mnt/provision/Vorc8BA/datafile type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/archive on /mnt/provision/Vorc8BA/archive type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/external on /mnt/provision/Vorc8BA/external type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)
192.168.2.131:/domain0/group-37/oracle_db_container-75/oracle_timeflow-123/temp on /mnt/provision/Vorc8BA/temp type nfs (rw,nosuid,bg,hard,rsize=1048576,wsize=1048576,vers=3,nointr,timeo=600,tcp,noacl,port=2049,addr=192.168.2.131)

4. Next, stop the instance and any other processes accessing the mount point.

In this example this is done using sqlplus:

[oracle@centos65-tgt ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Tue May 19 01:55:04 2015
Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> shutdown abort;
ORACLE instance shut down.
SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@centos65-tgt ~]$
1.4) We should now confirm the mount point is no longer busy by repeating the 1.0 step:
[root@centos65-tgt ~]# lsof|grep /mnt/provision/Vorc8BA
[root@centos65-tgt ~]#

5. After confirming no processes are using the mount point, obtain a list of the mount points by using mount -l -t nfs |grep <VDB> | cut -d ' ' -f 3

# mount -l -t nfs | grep Vorc8BA | cut -d ' ' -f 3
/mnt/provision/Vorc8BA
/mnt/provision/Vorc8BA/datafile
/mnt/provision/Vorc8BA/archive
/mnt/provision/Vorc8BA/external
/mnt/provision/Vorc8BA/temp

6.  Now proceed with the umount using the lazy and force options - umount -lf <obtained mount points from step 5)

# umount -lf /mnt/provision/Vorc8BA
# umount -lf /mnt/provision/Vorc8BA/datafile
# umount -lf /mnt/provision/Vorc8BA/archive
# umount -lf /mnt/provision/Vorc8BA/external
# umount -lf /mnt/provision/Vorc8BA/temp

 or

# for l in `mount -l -t nfs | grep Vorc8BA | cut -d ' ' -f 3`; do umount -lf $l; done

7. Confirm these mounts are unmounted mount -l -t nfs | grep <VDB>

# mount -l -t nfs | grep Vorc8BA

If this procedure is not successful in resolving the stale file handle, the OS administrator should be engaged to investigate other options, or the host may be rebooted. 

 

  • Was this article helpful?