<div dir="ltr">We used to have a similar problem on NFSv3 with FAI 4.1.1 (once in 100s of installs). <div><br></div><div>I've appended to this email what I've currently patched to address this issue (with FAI-4.1.1 as base). I was suspecting a dangling process or mount, this is why I introduced a lot of debugging code (I wanted to determine whether the NFS client hung at the umount -a or reboot itself). I thought the only significant thing this fixed was removing the -i from the reboot command, which I reported to upstream.</div>

<div><br></div><div>However, it could very well be that my debugging changes have actually contributed to the fix, as the -i flag was removed in FAI-4.2 and you're still experiencing a hang (assuming it's the same problem). </div>

<div><br></div><div>-Sander<br></div><div><br></div><div>--- a/lib/subroutines<br></div><div><div><div>+++ b/lib/subroutines</div><div>@@ -527,7 +527,7 @@<br></div><div>     cd /</div><div>     sync</div><div> </div><div>

-    killall -q sshd udevd</div><div>+    killall -qw sshd udevd rsyslogd</div><div> </div><div>     cdromdevice=$(awk '/ name:/ {print $3}' /proc/sys/dev/cdrom/info)</div><div>     # Verify whether the installation is from a fai-cd image, and whether it's actually mounted (instead of NFS mounted, for instance)</div>

<div>@@ -565,14 +565,26 @@</div><div>         # never reached, because chroot will reboot the machine</div><div>         die "Internal error when calling /tmp/rebootCD." >&2</div><div>     fi</div><div>-    umount $FAI_ROOT/proc</div>

<div>-    umount -arf 2>/dev/null</div><div>+</div><div>+    # Dump state of fai-client so we can reverse-engineer what goes wrong after a NFS hang</div><div>+    pstree -Apl > $FAI_ROOT/var/log/fai/pstree.log</div>

<div>+    lsof -n > $FAI_ROOT/var/log/fai/lsof.log</div><div>+</div><div>+    for dir in $(mount | grep $FAI_ROOT | awk '{print $3}' | LC_ALL=C sort -r); do</div><div>+        umount $dir</div><div>+    done</div>

<div>+</div><div>+    if mount | grep -q $FAI_ROOT; then</div><div>+        echo "dangling mounts:"</div><div>+        mount | grep $FAI_ROOT</div><div>+        sleep 10</div><div>+    fi</div><div> </div><div>     # reboot or halt?</div>

<div>     if [ "$flag_halt" -gt "0" ]; then</div><div>-        exec halt -dfip;</div><div>+        exec halt -dfip</div><div>     else</div><div>-        exec reboot -dfi;</div><div>+        exec reboot -df</div>

<div>     fi</div><div> </div><div> }</div></div><div><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jun 10, 2014 at 2:41 AM, Peter Keller <span dir="ltr"><<a href="mailto:psilord@cs.wisc.edu" target="_blank">psilord@cs.wisc.edu</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>

<br>

I have a question:<br>

<br>

Sometimes, maybe 2% of the time, when FAI 4.2 finishes installing and is<br>

shutting down to reboot, I get into a state where messages are logged to<br>

the screen about NFS not responding, and then ok, and then not responding,<br>

and then ok, and so on. They repeat every 5 minutes or so. The machine<br>

stays in this state and never actually reboots causing a manual interrupt in<br>

the automated install. The NFS server, AFAICT, was ok the whole time.<br>

The faiserver is a wheezy machine and I'm not using nfs 4.<br>

<br>

Has anyone ever seen this before?<br>

<br>

Thank you.<br>

<br>

-pete<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div>Sander Brandenburg</div><div>Director of Technology</div><div><br></div><div>eSATURNUS</div><div>T. +32 16 40 12 82</div><a href="http://www.esaturnus.com" target="_blank">www.esaturnus.com</a><div>

<img src="https://www.google.com/a/esaturnus.com/images/logo.gif?alpha=1"><br></div></div>

</div>