Fabric Management

A summary of the Birmingham Grid Fabric Management.

Introduction

Most grid sites use some sort of fabric management software to install, configure and monitor their nodes and services. The Birmingham nodes are managed through the use of cfengine. Other sites use Quattor and Puppet.

Services not included

The UI installations and BlueBEAR Worker Nodes are not managed by cfengine. They consist of tarball installations, and must be installed manually. You can find instructions for managing these installations here:

BlueBEAR WN

Local UI

Structure

The cfengine head node is epgmo1.ph.bham.ac.uk. This node keeps a master record of all the relevant configuration files, which are then copied out to all nodes under the control of cfengine. In addition, epgmo1 also maintains a repository of binaries and scripts that are copied out to relevant nodes.

The Birmingham definition works by mapping physical machine names onto roles within cfengine (eg epgse1.ph.bham.ac.uk is currently mapped onto the dpm_head_node role). This has the advantage of simplifying the process of redeploying a machine. For example, if the Site BDII services are to be moved onto a new machine, it should be the case that only one line must be edited in the cfengine configuration files.

Node Initialisation

When a node is installed via kickstart, cfengine should also be installed automatically. The kickstart script will attempt to download the files cfservd.conf and update.conf from the web server running on epgmo1.ph.bham.ac.uk/ using wget. These files are stored locally on epgmo1 in /var/www/html/config, and configure the local installation of cfengine to recognise epgmo1 as the master node. It should then be possible to configure the new node from epgmo1 using the cfrun command.

More information about kickstarting and configuring nodes can be found in the LocalGridCookbok? .

Configuration files

The main configuration file is called cfagent.conf and can be found in epgmo1:/var/cfengine/inputs. This file currently defines the main actionsequence, and a mapping of physical machines names to cfengine roles.

The first file to be imported is epgmo1:/var/cfengine/inputs/imports/classes.conf This file defines a set of cfengine classes, which are used to steer the other configuration files. For example, this file defines the glexec_wn class, consisting of all nodes that are marked as worker nodes. It also contains a list of commands, such as restart_maui, that can be used to execute particular actions on specific nodes.

The second file to be imported is epgmo1:/var/cfengine/inputs/imports/global.conf. This files defines actions that are applied to all nodes (such as copying iptables rules out to nodes, or editing /etc/hosts.allow to enable ssh connections from the local system).

Finally, there are number of role specific config files that are used to install and configure specific glite services. For example, configuration details specific to the APEL node are defined in epgmo1:/var/cfengine/inputs/imports/apel.conf. There is more information about these specific roles below.

Modules

There are a number of module files (written mainly in bash) located in epgmo1:/var/cfengine/inputs/modules. These scripts generally execute actions that are too cumbersome to complete in cfengine. For example, the firewall is managed from the iptables module, and system backups are managed from the backup module.

File Repository

Some nodes require extra binaries and large files that are not always available over the net or are easily created from within cfengine. In these cases, the relevant files are copied by cfengine onto the relevant nodes from a repository available in epgmo1:/var/cfengine/inputs/repo. There is a directory in the repo folder for each role. Some roles have subdirectories, eg alice_vobox/ contains subdirectories for the bb_alice_vobox and the twin_alice_vobox.

There is also a repo/general directory for files that need to be copied onto all nodes (ganglia config files for example).

Known Limitations and Problems

-- ChristopherCurtis - 08 Oct 2009

Topic revision: r3 - 07 Feb 2011 - 13:31:32 - ChristopherCurtis
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback