Grid Backup Policy

A discussion on backing up important data from the Grid nodes.

Requirements

The most important consideration is to make adequate back ups of all relevant log files. A requirement of the GridPP security incident handling policy is that all Grid related logs be available for 3 months and all ssh logs be available for 6 months. In addition, it is important that all server certificates are backed up, along with yaim configuration files.

There are also specific considerations for different machines, depending on their role. PBS accounting data should be saved from CEs. The accounting database on the MONBox should also be saved. It is important to backup the DPM database from the storage nodes as well.

Implementation

The general strategy for backing up data is as follows:

  1. The MONBox will copy data from relevant nodes into the directory epgmo1:/backup/. This directory is NFS mounted from epgsr1:/disk/f15d/.
  2. The directory structure in epgmo1:/backup/ should mirror the actual directory structure on the node.
  3. Cfengine will be responsible for copying the relevant data from the nodes back to epgmo1. It is envisaged that a special cfrun should be made for the purposes of this task, as copying some data (eg MySQL dumps) will take a long time. This task will be controlled by a cron job set to run at an appropriate time every day.
  4. Most files can be copied "as is", ie straight from to epgmo1. The MySQL dumps will have to first be created, and then given an appropriate unique name. The action of dumping the MySQL databases should be controlled via a cron job, initially distributed by cfengine.
  5. The backup directories on epgmo1 should be periodically rync'ed to BlueBEAR. This should again be controlled by a cron job on epgmo1, which in turn should be managed by cfengine.

List of files currently backed up

Machine Files Life Span
epgce1 /var/log/*/ 6 months
epgce2 /var/log/*/ 6 months
epgce3 /var/log/*/ 6 months
epgce4 /var/log/*/ 6 months
epgr01 /var/log/*/ 6 months
epgr02 /var/log/*/ 6 months
epgr03 /var/log/*/ 6 months
epgr04 /var/log/*/ 6 months
epgse1 /var/log/*/ 6 months
epgsr1 /var/log/*/ 6 months
epgsr2 /var/log/*/ 6 months
epaf17 /var/log/*/ 6 months
epgmo1 /var/log/*/ 6 months
Twins /var/log/*/ 6 months

-- ChristopherCurtis - 03 Feb 2010

Topic revision: r2 - 10 Mar 2010 - 12:28:30 - ChristopherCurtis
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback