Debian FAI 2.4beta installs Research Beowulf Cluster successfully

Frank Oliver Pfeiffer FrankO.Pfeiffer at gmx.de
Thu Feb 13 17:46:26 CET 2003


Debian FAI 2.4beta installs Research Beowulf Cluster successfully

A Success Stories: Research Cluster

published:
http://www.beowulf-underground.org/index.html

The department of Theoretische Physik (Universität des Saarlandes,
Germany) installed a beowulf cluster using only Debian GNU/Linux 3.0
(woody) and the beta version 2.4 of the Fully Automatically
Installation (FAI) package, successfully. Most of the time we spent to
install and configure the server and create a local Debian mirror. The
installation and configuration of the 20 nodes took less than one hour
using FAI. 


Recently, we bought a Beowulf Cluster from megware consisting of one
server and 20 nodes. Each machine has Intel Dual XEON 2.4GHz  on a
Supermicro P4DSE motherboard and 1GB memory. The server has two 73GB
SCSI hard disk, two 100Mbps Ethernet cards, one 1Gbps Ethernet card, a
graphic card and a CD- ROM drive. Each of the 20 nodes has one 40 GB
IDE hard disk, two 100Mbps Ethernet cards and a graphic card. The
1Gbps Ethernet is connected to a 1Gbps modul of a 3Com switch, which
is connected to the nodes. The second 100Mbps Ethernet card of each
machine servers for an additional service network also connected to a
3Com switch. A disk drive is not needed, though each machine of our
cluster has one.

The system is running Debian GNU/Linux 3.0 (woody) and uses the recent
kernel 2.4.20, which supports dual motherboards and the 1Gbps Ethernet
card. We set up a software RAID1 system for the two identical SCSI
hard disks on the server. We wanted to run kernel 2.4.20 on each node
and install them automatically - with a tool similar to JumpStart from
Solaris. This could be done by the Fully Automatically Installation
(FAI) Debian package 2.4beta, since it supports the kernel 2.4.20 -
older versions did not. So, we tested FAI 2.4beta and successfully!
Here, we could gain from our experience of an earlier installation:
two years ago we built our first beowulf cluster with single CPU
running Debian potato and FAI 2.2.2. To perform the installation of
the nodes via FAI the following is required: to create a local mirror
and to set up the FAI configuration space, DHCP, TFTP and NFS.

On the server we created the home, installed TFTP, NFS, NIS, NTP,
X-windows, a programming environment, the queuing system DQS and
MPI. To delegate installation and configuration tasks of the nodes to
FAI we created classes and copied the /etc/ hosts from the server to
/usr/lib/fai/ nfsroot/etc/ in order to distribute the / etc/hosts to
the nodes. During the installation of the nodes FAI created the hard
disk partitions, configured the network, created the NIS home, set up
the MBR, the lilo booting, NTP, installs all packages and copied the
/etc/hosts to the nodes. The set up of the second Ethernet card, the
DQS and MPI (simply copying the "/etc/dqs/conf_file", "/etc/
dqs/resolve_file" and "/etc/ mpich/"machines.LINUX" to the nodes) we
performed manually. This could have also been done by using much more
comfortable methods from FAI like the "fcopy" or "rshall" command
(which belongs to the FAI package) and performing shell or cfengine
scripts during the FAI installation process. During the installation
of the whole cluster most of the time we spent (1) to set up a
software RAID1 system, (2) to find the correct parameters in the
"mkdebmirror" script (see http:// www.informatik.uni-koeln.de/fai/ by
Thomas Lange) for creating a local Debian mirror and (3) to get the
DHCP with the PXE protocol running. The solutions of these problems
can be found on our homepage but should also published elsewhere. The
simultaneous installation of the 20 nodes via FAI run perfectly and
took only less then half an hour. Another half an hour we spent on
configuring the remaining tasks (which we did not delegate to FAI, but
could be), manually. A single node can be installed and configured
within 15 minutes. The simultaneous installation of 20 nodes is much
faster than that of a single node and strongly depends on the speed of
network connection between the nodes and the local mirror. 

In summary, for the installation of a Dual XEON Beowulf cluster we
really profit from using FAI 2.4beta - it could also stand for: Fast
Automatic Installation.



More information about the linux-fai mailing list