Fabric Management
A summary of the Birmingham Grid Fabric Management.
Introduction
Most grid sites use some sort of fabric management software to install, configure and monitor their nodes and services. The Birmingham nodes are managed through the use of cfengine. Other sites use Quattor and Puppet.
Services not included
The UI installations and
BlueBEAR Worker Nodes are not managed by cfengine. They consist of tarball installations, and must be installed manually. You can find instructions for managing these installations here:
BlueBEAR WN
Local UI
Structure
The cfengine head node is epgmo1.ph.bham.ac.uk. This node keeps a master record of all the relevant configuration files, which are then copied out to all nodes under the control of cfengine. In addition, epgmo1 also maintains a repository of binaries and scripts that are copied out to relevant nodes.
The Birmingham definition works by mapping physical machine names onto roles within cfengine (eg
epgse1.ph.bham.ac.uk
is currently mapped onto the
dpm_head_node
role). This has the advantage of simplifying the process of redeploying a machine. For example, if the Site BDII services are to be moved onto a new machine, it should be the case that only one line must be edited in the cfengine configuration files.
Node Initialisation
When a node is installed via kickstart, cfengine should also be installed automatically. The kickstart script will attempt to download the files
cfservd.conf
and
update.conf
from the web server running on
epgmo1.ph.bham.ac.uk/
using wget. These files are stored locally on epgmo1 in
/var/www/html/config
, and configure the local installation of cfengine to recognise epgmo1 as the master node. It should then be possible to configure the new node from epgmo1 using the
cfrun
command.
More information about kickstarting and configuring nodes can be found in the
LocalGridCookbok? .
Configuration files
The main configuration file is called
cfagent.conf
and can be found in
epgmo1:/var/cfengine/inputs
. This file currently defines the main
actionsequence
, and a mapping of physical machines names to cfengine roles.
The first file to be imported is
epgmo1:/var/cfengine/inputs/imports/classes.conf
This file defines a set of cfengine classes, which are used to steer the other configuration files. For example, this file defines the
glexec_wn
class, consisting of all nodes that are marked as worker nodes. It also contains a list of commands, such as
restart_maui
, that can be used to execute particular actions on specific nodes.
The second file to be imported is
epgmo1:/var/cfengine/inputs/imports/global.conf
. This files defines actions that are applied to all nodes (such as copying iptables rules out to nodes, or editing
/etc/hosts.allow
to enable ssh connections from the local system).
Finally, there are number of role specific config files that are used to install and configure specific glite services. For example, configuration details specific to the APEL node are defined in
epgmo1:/var/cfengine/inputs/imports/apel.conf
. There is more information about these specific roles below.
Modules
There are a number of module files (written mainly in bash) located in
epgmo1:/var/cfengine/inputs/modules
. These scripts generally execute actions that are too cumbersome to complete in cfengine. For example, the firewall is managed from the
iptables
module, and system backups are managed from the
backup
module.
File Repository
Some nodes require extra binaries and large files that are not always available over the net or are easily created from within cfengine. In these cases, the relevant files are copied by cfengine onto the relevant nodes from a repository available in
epgmo1:/var/cfengine/inputs/repo
. There is a directory in the repo folder for each role. Some roles have subdirectories, eg
alice_vobox/
contains subdirectories for the
bb_alice_vobox
and the
twin_alice_vobox
.
There is also a
repo/general
directory for files that need to be copied onto all nodes (ganglia config files for example).
Known Limitations and Problems
--
ChristopherCurtis - 08 Oct 2009