nfs not responding

Sander Brandenburg sander.brandenburg at esaturnus.com
Tue Jun 10 14:33:30 CEST 2014


We used to have a similar problem on NFSv3 with FAI 4.1.1 (once in 100s of
installs).

I've appended to this email what I've currently patched to address this
issue (with FAI-4.1.1 as base). I was suspecting a dangling process or
mount, this is why I introduced a lot of debugging code (I wanted to
determine whether the NFS client hung at the umount -a or reboot itself). I
thought the only significant thing this fixed was removing the -i from the
reboot command, which I reported to upstream.

However, it could very well be that my debugging changes have actually
contributed to the fix, as the -i flag was removed in FAI-4.2 and you're
still experiencing a hang (assuming it's the same problem).

-Sander

--- a/lib/subroutines
+++ b/lib/subroutines
@@ -527,7 +527,7 @@
     cd /
     sync

-    killall -q sshd udevd
+    killall -qw sshd udevd rsyslogd

     cdromdevice=$(awk '/ name:/ {print $3}' /proc/sys/dev/cdrom/info)
     # Verify whether the installation is from a fai-cd image, and whether
it's actually mounted (instead of NFS mounted, for instance)
@@ -565,14 +565,26 @@
         # never reached, because chroot will reboot the machine
         die "Internal error when calling /tmp/rebootCD." >&2
     fi
-    umount $FAI_ROOT/proc
-    umount -arf 2>/dev/null
+
+    # Dump state of fai-client so we can reverse-engineer what goes wrong
after a NFS hang
+    pstree -Apl > $FAI_ROOT/var/log/fai/pstree.log
+    lsof -n > $FAI_ROOT/var/log/fai/lsof.log
+
+    for dir in $(mount | grep $FAI_ROOT | awk '{print $3}' | LC_ALL=C sort
-r); do
+        umount $dir
+    done
+
+    if mount | grep -q $FAI_ROOT; then
+        echo "dangling mounts:"
+        mount | grep $FAI_ROOT
+        sleep 10
+    fi

     # reboot or halt?
     if [ "$flag_halt" -gt "0" ]; then
-        exec halt -dfip;
+        exec halt -dfip
     else
-        exec reboot -dfi;
+        exec reboot -df
     fi

 }



On Tue, Jun 10, 2014 at 2:41 AM, Peter Keller <psilord at cs.wisc.edu> wrote:

> Hello,
>
> I have a question:
>
> Sometimes, maybe 2% of the time, when FAI 4.2 finishes installing and is
> shutting down to reboot, I get into a state where messages are logged to
> the screen about NFS not responding, and then ok, and then not responding,
> and then ok, and so on. They repeat every 5 minutes or so. The machine
> stays in this state and never actually reboots causing a manual interrupt
> in
> the automated install. The NFS server, AFAICT, was ok the whole time.
> The faiserver is a wheezy machine and I'm not using nfs 4.
>
> Has anyone ever seen this before?
>
> Thank you.
>
> -pete
>



-- 
Sander Brandenburg
Director of Technology

eSATURNUS
T. +32 16 40 12 82
www.esaturnus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.uni-koeln.de/pipermail/linux-fai/attachments/20140610/4b80ed4e/attachment.html>


More information about the linux-fai mailing list