Setting up an Automatic Blacklisting Service

Requirements

  • Machine to run the service on
  • Working Grid UI setup on this machine
  • Ganga installed and working on this machine
  • A valid grid certificate (preferably a 'robot' one)

A Basic Blacklisting Service

A blacklisting service essentially just submits the same test jobs over and over again to the different resources available and records the outcome. Then, when queried by another client/user submitting a job, it will return the resources that are correctly running the test jobs. This has a number of benefits:

  • Users don't waste time submitting jobs to broken resources
  • These broken resources can be quickly found and alerted
  • There is minimal manual intervention at any point

Larger experiments have very advanced blacklisting services but for smaller groups, there is no general one supplied through the middleware. However, using the functionality in Ganga, it is quite easy to write a script to run the required tests and supply the results in a publicly accessible place.

Blacklist Script

A basic blacklsting script is attached to this twiki page which can be run in the ganga daemon to provide a list of CEs that are active. To run this, copy the blacklist.py file to where you want it to run and set it going with:

ganga --daemon blacklist.py

(Note you may want to change where the gangadir is placed as it will produce quite a few files!)

You can then see the output by going to:

<gangadir>/server/server-<host>.stderr/out

This will keep submitting basic Hello World jobs to the available CEs given by your VO and record the results in a web page and a separate whitelist file. You will need to keep the proxy updated with a cronjob or similar at the moment though!

There are a few settings at the top of the file to tweak how this script works:

htmlSummary: Location of the summary page (e.g. web area)
whitelistFile: Location of the whitelist file
stopFile: file to check for stopping the service
archiveSize: number of hours backlog of jobs to keep around
activeSize: number of hours that are check for complete/failed jobs
numComplete: number of completed jobs in last X hours before whitelisting
maxFailed: maximum numbe of failed jobs before blacklisting
loopTime: number of mins per loop

The whitelist file is a newline separate list of available CEs which can be easily read either from local storage or from a web page and given as requirements to a job submission.

There are a few drawbacks at present which I will hopefully fix in the near future:

  • No WMS blacklisting
  • No independant CREAM submission (i.e. everything currently goes through the WMS)
  • No easy way to provide your own job/template as the job definition
  • No automatic proxy renewal

-- MarkSlater - 24 Oct 2012

Topic attachments
I Attachment Action Size Date Who Comment
txttxt blacklist.py.txt manage 4.6 K 25 Oct 2012 - 14:52 MarkSlater  
Topic revision: r2 - 25 Oct 2012 - 14:52:25 - MarkSlater
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback