Ganglia confusion

Note: This is left here for historical reference. After having so much grief with Ganglia, when a friend pointed out collectd in the Fall of 2006, I jumped to it in a heartbeat. In my experience, collectd was ten times easier to get going and keep going. It's new software and the web interface is hardly sophisticated, but at least I can figure out how to configure it and I'm going to trust the web interface will only get better. You might find my notes on collectd of interest.



Subject: Re: [Ganglia-general] Ganglia confusion
Date: Mon, 14 Aug 2006 11:37:39 -0400
To: ganglia-general@lists.sourceforge.net

Terry Gliedt wrote:

> I've been trying to get 3.0.2 installed and working and after spinning my head far too many times, I've decided I don't really understand the larger picture of how ganglia works.

I've been struggling on and off for months trying to get Ganglia to work. I just could not get into my head HOW this works. Last week I finished a marathon effort trying all sorts of combinations on a tiny two node cluster and then when it worked, grew that to a larger cluster. By George, I think I've got it finally.

The following is what I needed to know to install Ganglia. Maybe there is someone else equally boneheaded as I was who needs to know this too. This is my understanding of what's going on and how it all ties together and may or may not be accurate. Please feel free to correct this.

Ganglia consists of several pieces of software (gmond, gmetad and some PHP code for a web interface). Gmond should be run on every node of the cluster. It's role is only to collect data, save it in memory and sometimes make it available:

  • Collect information about the local node and send this to other nodes
  • Possibly receive information from other nodes
  • Possibly return node summary information as XML

The summary data is managed by gmetad which asks for the information and then saves it. The web interface reads this data and gives us the information on the cluster/grid. The configuration files are gmond.conf and gmetad.conf which control the two daemons.

GMOND

There are only a few sections in gmond.conf which are really important:

  • udp_send_channel
  • udp_recv_channel
  • tcp_accept_channel

All nodes must have a udp_send_channel section. This tells gmond where to send the data it has collected about the local node (even if that is to itself). You can configure this to broadcast the information or send it to a particular host and port. If you specify a particular host, you probably want all nodes to send data to the same place. You can also have each node send the same information to more than one place for redundancy.

At least one node must have a udp_recv_channel section. Data received by this section forms a snapshot of the state of all nodes. You can configure this to receive the data via broadcast or receive it on a particular IP interface and port. More than one node could be receiving the same data. You can use the 'deaf' keyword in the 'globals' section to disable this section, even if it is defined.

For Ganglia to really be useful, at least one node which has udp_recv_channel defined must have a tcp_accept_channel section also. This section describes a particular IP interface and port where a query can be sent. Gmond will return an XML string of the summary information it has collected. This interface is the one gmetad will talk to.

GMETAD

The role of this daemon is simply to ask for summary information from gmond and the save it. The data being saved is used by the web interface. The keyword 'data_source' specifies the host where tcp_accept_channel is defined and its port. The keyword 'rrd_rootdir' specifies the path to a directory where the data is saved.

WEB APPLICATION

The web application uses a configuration file, conf.php. Important variables in this file are '$ganglia_ip', '$ganglia_port' which specify the node/port where tcp_accept_channel is defined. The variable '$rrds' specifies the path to the data saved by gmetad. The simplest set up would have the web server running gmetad.

SUMMARY

Ganglia can probably be set up in an infinite number ways, but to get me started I did the following (note that for other reasons I cannot use broadcast, so I specified addresses explicitly):

  • Every node runs gmond. Each defines udp_send_channel to send data to 192.168.1.2 (port 8649)
  • 192.168.1.2 defines udp_recv_channel (port 8649) and tcp_accept_channel (port 8650). This means 192.168.1.2 is the only node which has all three sections defined.
  • The web server runs gmetad and pulls data from 192.168.1.2 (on a public interface) using port 8650. The data is written locally and is accessed by the web application.