FAI 2.4 upgrade problems

Justin Doiel jdoiel at engr.uark.edu
Thu Feb 27 21:46:58 CET 2003


OK, one more time. :)

On Thu, Feb 27, 2003 at 11:29:27AM +0100, Felix K?hling wrote:
> On Wed, 26 Feb 2003 14:12:19 -0600
> Justin Doiel <jdoiel at engr.uark.edu> wrote:
> 
> > 
> > Heya.
> > 
> > I'm presently going through the same pain, so heres my two cents. :)
> > 
> > On Wed, Feb 26, 2003 at 06:39:21PM +0100, Felix K?hling wrote:
> > > Hi,
> > > 
> > > I used FAI 2.3.4 with Woody and DHCP. On the upgrade to 2.4 I'm
> > > experiencing 4 serious problems that prevent me from completing an
> > > installation successfully.
> > > 
> > > 1. I had trouble making a boot floppy. The newly ext2-formatted floppy
> > > was automagically mounted as vfat and when trying to rmdir lost+found
> > > make-fai-bootfloppy aborted. I fixed it by adding "-t ext2" to the mount
> > > command line.
> > 
> > I didnt run into this one at all. my boot floppies work perfectly.
> 
> I'm not really sure it's related to the upgrade. Could it just be a
> strange floppy? Anyway, the fix was rather simple once I had figured it
> out :)
> 
> > > 2. I can't get a shell after install or sysinfo (didn't try other
> > > actions). No matter what I do (pressing <RETURN> or ctrl-c) it always
> > > reboots. Somehow I managed to get a shell after messing around a bit in
> > > fai_end, but I didn't really understand how.
> > 
> > I'm also having this problem, no matter what, it reboots.
> > I THINK it may have something to do with me removing tcsh from my setup, but am not sure.
> 
> I never used tcsh.
> 
> > > I added a line "/sbin/sulogin" in fai_end. But the shell behaved quite
> > > strangely. There was no prompt and backspace repeated deleted characters
> > > on the terminal. After I made some mistake that killed the shell init
> > > entered runlevel 2 and I got a usable shell.
> > > 
> > > 3. The logfiles cannot be saved on the install server. I get two error
> > > messages about rcmd problems. Sorry, I forgot the exact words. Maybe
> > > this is related to the next one.
> > > 
> > 
> > what method did you use? i used the SSH method, installed an SSH client on the root
> > filesystem, and had to manually copy in a /root/.ssh/known_hosts with my server in it.
> > i still get an error message, but it works.
> 
> I use rsh. I didn't change anything in the list of packages in nfsroot
> during the upgrade to 2.4.
> 
> > 
> > <snip>
> > # extra packages which will be installed into nfsroot
> > # add lvm, raidtools2 only if needed
> > NFSROOT_PACKAGES="expect pump ssh"
> > </snip>
> 
> NFSROOT_PACKAGES="ssh expect reiserfsprogs dpkg-dev rsh-client"
> 
> > 
> > i ALSO had to read through the scripts before finding out there is a LOGSERVER
> > variable that needs to be set in fai.conf, EG:
> > 
> > <snip>
> > # /boot/fai;chmod g+w /boot/fai. If the variable is undefined, this
> > # feature is disabled
> > LOGUSER=faimaster
> > LOGSERVER=ageruka
> > # use ssh or rsh for copying log files to user fai and for changing
> > # tftp symbolic link
> > #FAI_REMOTESH=rsh
> > #FAI_REMOTECP=rcp
> > FAI_REMOTESH=ssh
> > FAI_REMOTECP=scp
> > </snip>
> 
> LOGUSER=fai
> # use ssh or rsh for copying log files to user fai and for changing
> # tftp symbolic link
> FAI_REMOTESH=rsh
> FAI_REMOTECP=rcp
> 
> Hmm, that could be the problem. I never had a LOGSERVER= line in my
> fai.conf. Not with 2.3.4 and not in the maintainer version of 2.4. But
> it used to work with 2.3.4 anyway.
> 
For additional confusion, without the LOGSERVER= line, mine SOMETIMES worked, sometimes
failed to resolve "" into the loghost. mucho wierdness.

> > 
> > <NOTE>LOGUSER=faimaster is a local change, i dont like naming users after services.</NOTE>
> > 
> > > 4. The DHCP information doesn't make it into environment variables. I
> > > tried the dhclient -lf /dev/null command line as in get-boot-info
> > > manually in the shell, but it didn't output anything to stdout. If I
> > > understand get-boot-info correctly dhclient is *supposed* to output
> > > variable definitions for all DHCP parameters. They are redirected to
> > > /tmp/fai/bootlog and sourced later by task_confdir.
> > > 
> > 
> > dont have that problem here. :P
> 
> Do you use DHCP or BOOTP? After reading a few scripts and manpages I
> have an idea what the problem could be. I saw a dhclient-perl script in
> .../nfsroot/sbin which seems to translate dhcp options into shell
> variables. It gets called by dhclient through /sbin/dhclient-script. But
> there is this line which makes me worry:
> 
> # exit if no data is available
> exit 0 unless $ENV{new_option_170};
> 
> As suggested in the new documentation I had removed the option_17x
> options from my dhcpd.conf and used the new FAI_LOCATION variable in
> fai.conf and class/LAST.var for setting FAI_ACTION. But here it looks as
> if at least option_170 is still needed. Is this a bug in the script or
> the documentation?
> 
> I havn't tested this theory yet. Maybe later today. I'll let you know
> what I find.
> 
I'm not using option 170, i'm configured a little strangely...

I've already got a dhcp server on my network, so I just moddified its configuration instead of setting up another one.

Heres the configuration section that has to do with my FAI machine:

group {

        use-host-decl-names on;

host ageruka {
        hardware ethernet 00:80:C8:B9:9A:19;
        fixed-address 192.168.0.15;
}
host nijuuichi {
        hardware ethernet 00:40:F4:15:29:CA;
        next-server 192.168.0.15;
        option root-path "/usr/lib/fai/nfsroot/";
        fixed-address 192.168.0.16;
}
}

ageruka ((you) flying?) is the FAI master server, nijuuichi(21) is the FAI machine.
I had to turn on use-host-decl-names so that the DHCP server would give the hostname
specified in the dhcpd.conf to the client. next-server and option root-path are the 
only differences between server and client.

I THINK dhcpd may be translating "option root-path" into the 170 option you're looking for.

For the record, this is a new cluster install, my old one is based on bootp, rather custom floppies, and much more hackery. :)
(I use a staggered development/production cycle, so my bootp/old FAI cluster is still in service on the same network)

> > > Thus IPADDR ends up undefined and as a result my 01alias doesn't add
> > > most of the classes (NETWORK, LILO, BOOT, ...). Eventually I get an
> > > unbootable system without a (simple) way to get a shell after the
> > > installation and no logfiles on the install server. That makes debugging
> > > real fun! ;-)
> > > 
> > 
> > hmm. that one either.
> 
> Well, it's a consequence of the DHCP variable problem and specific to
> the way I detect the configuration (by IPADDR).
> 
> > > If you need any more details or have patches for me to test, just let me
> > > know. The most important first step is probably to get the shell working
> > > after the installation.
> > > 
> > > Regards,
> > >    Felix
> > 
> > Good luck, hope I helped instead of confusing.
> > 
> > Justin Doiel <jdoiel at engr.uark.edu>
> > 
> 
> Felix
> 
>                __\|/__    ___     ___     ___
> __Tsch??_______\_6 6_/___/__ \___/__ \___/___\___You can do anything,___
> _____Felix_______\?/\ \_____\ \_____\ \______U___just not everything____
>   fxkuehl at gmx.de    >o<__/   \___/   \___/        at the same time!

You know, it'd be nice if someone would look at this exchange, and tell both of us whats wrong with our shells after install. :)

Justin Doiel <jdoiel at engr.uark.edu>



More information about the linux-fai mailing list