nfs not responding
Sander Brandenburg
sander.brandenburg at esaturnus.com
Tue Jun 10 14:33:30 CEST 2014
We used to have a similar problem on NFSv3 with FAI 4.1.1 (once in 100s of
installs).
I've appended to this email what I've currently patched to address this
issue (with FAI-4.1.1 as base). I was suspecting a dangling process or
mount, this is why I introduced a lot of debugging code (I wanted to
determine whether the NFS client hung at the umount -a or reboot itself). I
thought the only significant thing this fixed was removing the -i from the
reboot command, which I reported to upstream.
However, it could very well be that my debugging changes have actually
contributed to the fix, as the -i flag was removed in FAI-4.2 and you're
still experiencing a hang (assuming it's the same problem).
-Sander
--- a/lib/subroutines
+++ b/lib/subroutines
@@ -527,7 +527,7 @@
cd /
sync
- killall -q sshd udevd
+ killall -qw sshd udevd rsyslogd
cdromdevice=$(awk '/ name:/ {print $3}' /proc/sys/dev/cdrom/info)
# Verify whether the installation is from a fai-cd image, and whether
it's actually mounted (instead of NFS mounted, for instance)
@@ -565,14 +565,26 @@
# never reached, because chroot will reboot the machine
die "Internal error when calling /tmp/rebootCD." >&2
fi
- umount $FAI_ROOT/proc
- umount -arf 2>/dev/null
+
+ # Dump state of fai-client so we can reverse-engineer what goes wrong
after a NFS hang
+ pstree -Apl > $FAI_ROOT/var/log/fai/pstree.log
+ lsof -n > $FAI_ROOT/var/log/fai/lsof.log
+
+ for dir in $(mount | grep $FAI_ROOT | awk '{print $3}' | LC_ALL=C sort
-r); do
+ umount $dir
+ done
+
+ if mount | grep -q $FAI_ROOT; then
+ echo "dangling mounts:"
+ mount | grep $FAI_ROOT
+ sleep 10
+ fi
# reboot or halt?
if [ "$flag_halt" -gt "0" ]; then
- exec halt -dfip;
+ exec halt -dfip
else
- exec reboot -dfi;
+ exec reboot -df
fi
}
On Tue, Jun 10, 2014 at 2:41 AM, Peter Keller <psilord at cs.wisc.edu> wrote:
> Hello,
>
> I have a question:
>
> Sometimes, maybe 2% of the time, when FAI 4.2 finishes installing and is
> shutting down to reboot, I get into a state where messages are logged to
> the screen about NFS not responding, and then ok, and then not responding,
> and then ok, and so on. They repeat every 5 minutes or so. The machine
> stays in this state and never actually reboots causing a manual interrupt
> in
> the automated install. The NFS server, AFAICT, was ok the whole time.
> The faiserver is a wheezy machine and I'm not using nfs 4.
>
> Has anyone ever seen this before?
>
> Thank you.
>
> -pete
>
--
Sander Brandenburg
Director of Technology
eSATURNUS
T. +32 16 40 12 82
www.esaturnus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.uni-koeln.de/pipermail/linux-fai/attachments/20140610/4b80ed4e/attachment.html>
More information about the linux-fai
mailing list