Sunday, July 3, 2011

Get out of NFS mount hang

When nfs is mounted without 'intr' option, it has the habit of hanging if the server is not responding. This is how the linux NFS works, the kernel continuously keeps retrying the request and does not return.

If the nfs server is down and you fire 'df -h', then it would hang while listing the nfs mount, it won't respond to any signals as it is stuck inside kernel.

The simplest solution is to force umount the mount point

XXX:/tmp # umount -f /tmp/test

But it you just want your hung process to return without removing your mount, then you can just plumb the server ip on the localhost

XXX:/tmp # df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             14413312  12154924   1526228  89% /
udev                   8187884       336   8187548   1% /dev
/dev/sda5             23711000   5215016  17291516  24% /opt
/dev/sda6              5676464   2786656   2601444  52% /var
/dev/sda7              5676464   1496004   3892096  28% /tmp

XXX:/tmp # mount
/dev/sda1 on / type ext3 (rw,acl,user_xattr)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
/dev/sda5 on /opt type ext3 (rw,acl,user_xattr)
/dev/sda6 on /var type ext3 (rw,acl,user_xattr)
/dev/sda7 on /tmp type ext3 (rw,acl,user_xattr)
10.10.10.10:/test on /tmp/test type nfs (rw,addr=10.10.10.11)

The simplest way is to plumb the server ip 10.10.10.11 on the localhost that would give the nfs client the impression that is talking to the server. Once you start nfs server on the localhost, it will be able to talk to it and find out that the share is not exported by this server and come out

On the client machine you need to start the NFS service and plumb the server ip
XXX:# /etc/init.d/nfs start
XXX:# ifconfig lo:0 10.10.10.11

Now any process hanging while reading data from nfs should come out of kernel

4 comments:

  1. Hi Kumar,

    I tried your approach - as I run the ifconfig command, the client gets hanged.

    Am I doing something wrong?

    ReplyDelete
  2. This wouldn't work. the best way to solve this is to reboot the server.

    ReplyDelete
  3. Try umount -l if the -f option doesn't work.

    ReplyDelete
  4. umount -l worked for me. thanks!

    ReplyDelete