Use ganglia to monitor DRBL (Diskless Remote Boot in Linux) based cluster

Another helmer cluster with 52 cores, 104 GB RAM and 3 TB hard disk has been assembled. To monitor master and slave nodes, I installed ganglia on the master node. In a nutshell, ganglia works like this:
1. gmond: A process running on the slave nodes. On Ubuntu, run

sudo apt-get install ganglia-monitor

to install it. It’s configuration file is /etc/gmond.conf, and the associated service is ganglia-monitor.
2. gmetad: A process running on the master that collects the statistics sent by the various gmond processes in the slave nodes. For Ubuntu, this is the package ganglia-webfrontend package:

sudo apt-get install ganglia-webfrontend

Its configuration in /etc/gmetad.conf, and the associated service is gmetad.
3. A web UI: The web front end is installed/contained within the same package as gmetad. The UI is used to display the collected data.

Since the DRBL based slave nodes have no harddrive, so all the services must be started from the master.
1. The first step is to add “#mcast_if = eth1”:

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  mcast_join = 239.2.11.71
  #mcast_if = eth1
  port = 8649
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  mcast_join = 239.2.11.71
  #mcast_if = eth1
  port = 8649
  bind = 239.2.11.71
} 

The master node uses eth1 to communicate with slaves, whereas slaves use eth0 communicate with the master. So the trick here is keep the “#” and do

sudo drblpush -c /etc/drbl/drblpush.conf

2. With all slaves alive, run the following commands

sudo drbl-client-service ganglia-monitor on
sudo drbl-doit "/etc/init.d/ganglia-monitor start"

3. done. It will look like this

Ganglia monitor DRBL based cluster

Ganglia monitor DRBL based cluster

!!!Remember before every time you run

sudo drblpush -c /etc/drbl/drblpush.conf

Keep the “#” in front of “mcast_if = eth1” and uncomment it when you start ganglia on the master node.

2 thoughts on “Use ganglia to monitor DRBL (Diskless Remote Boot in Linux) based cluster

  1. This tool is exactly what I was looking for monitoring my own renderfarm (also inpired in the elmer idea). I had to google a bit to get it working propperly, specially the web front end. There are some apache configuration that needs to be done (Ubuntu 12.04) to get the web graphs working. Other than that, very useful post!

Leave a Reply

Your email address will not be published. Required fields are marked *