fai success story

Venkata venkata at cs.uno.edu
Tue Feb 3 23:53:55 CET 2004


Hello all. Here's some info. on the success I had using FAI to install
a 72 node Beowuld cluster at the Lousiana State University New Orleans.

----------------------------------------------------------
1. Setup a regular Debian box (woody), but then did "apt-get dist-upgrade"
to (testing). This machine acts as the Beowulf cluster head node. The 
config is:
   * Dual Xeon 2.2Ghz
   * 2 GB RAM
   * 2 36GB Ultra 160 SCSI Disks
   * 100 Mbps 3com 3c59x NIC / 1000 MBps Broadcom Tigon3 NIC

2. Installed fai 2.5.1 and fai-kernels 1.5.3 on this box to make it the 
install
server for the internal Beowulf subnet (eth1 is the Broadcom Tigon3 NIC).

3. Set up the dhcp3 server to serve the fai install image -- snippet of 
the file
is below:
   # dhcpd.conf for fai
   # replace FAISERVER with the name of your install server

   # deny unknown-clients;
   option dhcp-max-message-size 2048;
   use-host-decl-names on;
   #always-reply-rfc1048 on;

   filename "/boot/fai/installimage";

   # the server from which to load the initial boot file if different
   # from server-name
   #next-server FAISERVER;

   subnet 192.168.1.0 netmask 255.255.255.0 {
      server-name "master";
      default-lease-time 6000;
      max-lease-time 6000;
      option subnet-mask 255.255.255.0;
      option broadcast-address 192.168.1.255;
      option domain-name-servers 192.168.1.1;
      option routers 192.168.1.1;
      option domain-name "linux.beowulf";
      option nis-domain "gumbo";
      option nis-servers 192.168.1.1;
      option root-path "/usr/lib/fai/nfsroot";
  }

  host gumbo01 {
     hardware ethernet   00:00:00:00:00:00;
     fixed-address       gumbo01;
     option host-name    "gumbo01";
  }

4. Set up NIS on the master node using the info in the NIS Howto. No 
problems
here.

5. Configured the partitioning for the client nodes and for the NFS file 
server (separate machine
from the master node that will serve home directories and user apps.). 
The partitioning for
the nodes is:

 # filename: GUMBO_IDE
 disk_config hda
 primary    swap        2048        rw
 primary /        4096        rw,errors=remount-ro ;-j ext3
 primary /scratch    0-        rw,errors=remount-ro ;-j ext3

Each node is a 2Ghz Pentium 4 with 1GB of RAM and 20GB of disk. The 
fileserver is a dual
1.4Ghz Xeon Dell Poweredge 2550 (2GB RAM) connected to a Dell Powervault 
220s with about 0.5 terabyte
of total storage. The filesever was configured similar to the nodes, but 
the RAID filesystem was
configured manually after the FAI install. The disk partitioning used 
was as follows:

  # filename: GUMBO_FILESERV
  disk_config sda
  primary    swap        8192        rw
  primary /        0-        rw,errors=remount-ro ;-j ext3

6. All the nodes use the autofs automounter to mount shared apps and 
user home directories -- the config
is read from NIS.
The following files were added to the custom files directory of fai:

  # filename: FILES_HOME (auto.home)
  +auto.home

  # filename: FILES_AUTO (auto.master)
  +auto.master

  #  filenames: RSH_FILE (hosts.equiv)
  # this file contains a list of the hostnames of all the nodes and is 
duplicated on every node. This
  # is to support passwordless rsh access.

7. For the packages selection, I pretty much stuck with the default set 
for a beowulf node as defined
by the fai samples, but I added the following selections:

  # filename: GUMBO
  PACKAGES install
  ganglia-monitor
  gmetad
  libganglia1
  libganglia1-dev
  xlibmesa-dev
  xlibmesa3
  tk8.3
  tk8.4
  tk8.4-dev
  tcl8.3-dev
  tcl8.4-doc
  python-dev
  python-doc
  ssh
  ntp

8. In the scripts dir, I added the following lines to the LAST file:

 # NIS SPECIFIC HACKS
 cat > $target/etc/yp.conf <<-EOF
 ypserver 192.168.1.1
 EOF

 rmdir $target/etc/network/if-up.d

 # HACK TO FIX X11 DIR PERMISSIONS
 chmod 755 $target/usr/X11R6/bin
 # SET UP NTP.CONF FILE
 cp /fai/files/etc/ntp.conf/NTP_FILE $target/etc/ntp.conf

NIS, X11, and NTP would not work without these hacks. Here's the 
ntp.conf file:

   tinker panic 0
   logfile /var/log/ntpd
   driftfile /var/lib/ntp/ntp.drift
   #broadcastclient yes
   server master

9. I used the mkdebmirror and debmirror scripts to create a partial 
(testing / sarge) mirror on the
master node. After that, I edited the fai.conf (it's attached to this 
mail) and ran fai-setup. Here's where
I ran into a problem because libdetect0 is not a part of sarge, so I had 
to remove all references to it from
make-fai-nfsroot -- I believe that it has been replaced by discover in 
sarge. After that fai-setup ran fine
(apart from the usual apt complaints).

10. I booted each node using floppies from www.rom-o-matic.net. Here's 
where I ran into major
trouble but this is more to do with the current state of sarge than with 
FAI. A lot of the packages  (especially
Gnome 2) have broken dependencies in sarge, and when these failed to 
install, the entire install
was aborted. Basically how I got around this was by studying the logs 
after each install and removing
the broken package from the package list(s). It took quite a few 
installs to get everything right, but once
all the broken packages were removed, there were no problems.
-------------------------------------------------------------------------------- 


I think that's about it regarding the FAI install -- like I said before, 
FAI itself didn't have many issues but
the state of the (sarge) packages led to some headaches. As a side note, 
we use a Raritan KVM system to access
the consoles of each node -- this is nice because both video and 
keyboard signals can be sent over regular
CAT 5 cable, so management is easy (no need to connect keyboard, screens 
to nodes, etc.). Let me know if you need
any more info. Thanks.

--Venkata



More information about the linux-fai mailing list