Configuration Management and Monitoring of a Debian Etch Beowulf Cluster

Farid Behnia behnia at gmail.com
Fri Aug 31 08:48:58 CEST 2007


Hi,

I've put together a simple 2-node cluster using Debian etch , OpenMPI , FAI
& Cfengine.
I'm looking for ideas that can help me with building a better self-healing
cluster. Right now I'm making rule files for cfengine and would acknowledge
any input on sample files and important configurations that need to be made
for the cluster's health. (Although it's site-specific but I'm sure I can
get good hints out of them)

However I'd also be glad to see if you have any monitoring system in mind
that can cooperate with cfengine in the maintenance job. I've looked briefly
into Ganglia and Nagios so far. It seems Ganglia is mostly meant for large
(groups of) clusters and focuses on hw resources. Nagios seems to be
better-suited for my job, but the gurus at cfengine mailing list believe
that cfenvd & cfexecd can provide equal monitoring & recovery capability (in
terms of response time).
What's your take on either of them?

Thanks beforehand to anyone sharing their experience.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.uni-koeln.de/pipermail/linux-fai/attachments/20070831/305f5cdd/attachment.htm 


More information about the linux-fai mailing list