FAI failure behavior

Holger Parplies wfai at parplies.de
Tue Jun 3 01:22:23 CEST 2014


Hi,

[re-sending to the list; somehow missed that on the first attempt]

Peter Keller wrote on 2014-05-29 14:51:09 -0500 [FAI failure behavior]:
> Hello,
> 
> I'm using FAI 4.0.6 to install some clusters, but occasionally I mess
> up and use the wrong/bad disk configuration. FAI bails out and drops me to
> a prompt. This is expected behavior, EXCEPT, something ends up getting
> written to the MBR/GPT anyway on the bootable drive even though the
> disk setup was wrong for it. [...]
> 
> So, I request a feature:
> 
> For any bootable drive, if FAI fails to set up the disk configuration for
> it, can it automatically zero out the first 512 bytes of the drive?

what a horrible idea!

Supposing the *right* disk configuration specifies to *preserve* parts of the
disk contents, those would now have been logically deleted. I don't know why
or what gets "written to the MBR/GPT anyway", but there's at least a chance
parts of the partition table (MBR; don't know about GPT) are still intact;
zeroing it out can only make things worse. If anything, the contents should be
saved before any modification and restored in case of an error.

For your situation, zeroing *might* be a solution, but it is not for the
general case.

> So when the reboot happens, the BIOS will see there is no valid MBR and
> fall down to the network installation method?

The *correct* solution is to select network boot first, then hard disk boot.
FAI will at least try to do what you want: if no installation is scheduled,
fall through to the next BIOS boot method (at least if you did your
'fai-chboot -o default' like you're supposed to ;-). For me, that doesn't work
for Virtualbox hosts, but for real machines it usually works just fine. Of
course, your experience may vary.

> Also, is there some kind of a "failed to install" hook that already exists
> where I can jam a script?

How, exactly, would you define "failed to install"? There are many ways an
installation can "fail". All my installations, for instance, end with a
message "ERRORS found in log files.", but most of the time they're fine.
I'd love to clean that up, but it's not always practicable.

If you truely want to follow this route, you might try using a partition hook.
Someone suggested a way to write a "post-hook" quite some time ago:

	#!/bin/bash
	# this is hooks/partition.YOURCLASS
	task_partition
	# perhaps save return code here
	skiptask partition
	# your code goes here

After calling task_partition, you can probably check $? for an error code.
I haven't tried that out, though.

Regards,
Holger


More information about the linux-fai mailing list