FAI, NVME, UEFI

Marc Hoppins marc.hoppins at eset.com
Thu Apr 28 12:10:19 CEST 2022


Hi all,

We recently took receipt of Dell C6520 systems. These come with

2 x 256GB NVME boot devices (RAID)
2 x 8TB SSDs

1 x 1G EMBEDDED NIC
2 x 25G INTEGRATED NIC

For the sake of identification we will call these SRV 01-08

To say we are having problems is a slight understatement.  Initially we could not get these up without altering the BIOS settings.  UEFI boot setting in the BIOS was  necessary to recognise the various devices, this meant that GRUB_EFI was needed for FAI.  Disabling the 1G NIC was required as FAI wanted to create a BOND between the 1G and one of the 25G, so we have to also change the boot setting for this 25G NIC to be PXE.

Our end-user was adamant that we use UBU16 as he did not want to break any of his development work, or take a chance that something would not operate as before.  So we FAI’d to UBU 16 and all was fine.  However, a decision was then made to upgrade to UBU18. So we decided to re-FAI them instead of doing an in-place upgrade.  Out of 8 servers, 6 installed with UBU18 but decided to offer up ‘Diskfilter writes are not supported’ errors.  This results in NIC failures/issues on subsequent boot.

For one of the remaining two, the iDRAC decided to crap itself and all we can now do (until some living person presses the big red tit) is power cycle the server, no keyboard actions are possible at all.

A further decision was then made to FAI to UBU16 as this produced success across all servers, and we will do a release-upgrade. One has successfully completed, the non-responsive KVM is still non-responsive.

Our disk_config sets
1G for /boot/efi
The remainder is LVM2

Of the fails:

dhcp PREINIT eth0 up
dhcp FAIL
RTNETLINK answers: file exists

…and then drops to a dracut prompt.  So FAI seems to only find one of the two 25G NICs (SOMETIMES!!) and thus fails.  This is odd because with a subsequent retry on SRV01 with UBU18, eth0 did not appear but eth1 did yet the FAI continued

For RTNETLINK, what file exists? Doesn’t FAI start with a clean slate??

So, for all 8 attempted UBU16 installs we have

SRV01 - success
SRV02 - success
SRV03  - No KVM
SRV04 – Fail, then boots to previous OS (UBU18) with DISKFILTER WRITES error
SRV05 – Fail, then boots to previous OS (UBU18) with DISKFILTER WRITES error
SRV06 – success
SRV07 – success
SRV08 – success

All hardware is the same, all BIOS settings are the same.  I have retried the fails several times and all return the same errors.

SRV01 - I tried a re-FAI using UBU18, eth0 did not appear in the messages but eth1 did.  UBU18 then completed but we end up with DISKFILTER error again, and no bond.  I tried again and eth0 appeared, as did eth1, yet I saw DHCP FAIL. This then results in DISKFILTER error yet again and I am of the opinion that FAI locating devices is an extremely hit and miss affair according to some strange sequential mis-logic.

SRV01 - I went back to UBU16 and all went without mishap.
SRV01 – I tried UBU20 and this also appeared to work fine.
FINALLY, back to UBU16 as we are unable (at this time) to use UBU20, and all installed without issue.

If anyone has any consistent success with this hardware, Ubuntu versions and FAI I’d appreciate some hints.

Thanks

Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.uni-koeln.de/pipermail/linux-fai/attachments/20220428/a6593669/attachment.html>


More information about the linux-fai mailing list