Simplified Submission

A simplified grid submission system, based on the work conducted at the University of Glasgow.

qgsub

The University of Glasgow have spent some time developing wrapper scripts for grid job submission. These scripts have been designed to mimic the familiar torque tools qsub and qstat. These tools have been adapted for use at Birmingham.

Submitting a simple job

This example assumes that you have already applied for, and installed, a valid grid certificate. It also assumes that you are member of a supported Virtual Organisation.

This example will submit the following simple batch script to the grid. Note that this script could just as easily be submitted to a local torque cluster.

######################
# A simple batch script      
######################

#!/bin/bash

#PBS -l cput=00:30:00

date
pwd

######################
# End of script                    
######################

Note that this script specifies a maximum CPU time limit of 30 minutes ( PBS -l cput=00:30:00). Including this directive is a requirement of the gqsub script - jobs will cannot be submitted to the grid without specifying a maximum CPU time.

Start by setting up a Grid User Interface. This can be achieved on the Particle Physics system by simply opening an SL4/5 terminal window and typing the command source lcguisetup. On BlueBEAR this can be achieved by using the command source /apps/hep/lcgui/lcguisetup.

Before submitting your very first job, you should setup a valid grid proxy for your VO with the command voms-proxy-init --voms _your_vo_. The gqsub command will normally take care of proxies for you by using the default VO specified in ~/.gqsubrc. If this file doesn't exist, it will be created, taking the default VO as being the first VO listed in any existing proxy.

A job can be submitted using the command gqsub _script_name_. The gqsub command will package up your job, create a JDL file automatically, and submit your job to the grid. Note that you may be asked for your grid certificate password if it detects that your proxy does not have sufficient time left.

Once the job has been submitted, you can check on it's progress using the gqstat command. This will print out details of the status of all submitted jobs. Once a job has completed, the stdout and stderr files will be retrieved automatically by gqstat, and deposited into a directory with a random string name (eg cjc_asqhknFN24352).

Using the Sandbox

Sometimes it's desirable to send files with a grid job to a worker node. Providing that the total size of the files is small (~20 MB), they may be staged in via the grid sandbox. In order to specify an input file, the directive:

#GQSUB -W stagein=myinputfile.dat

should be added to the batch script. Note that one directive must be added for each file, so if lots of files are required then they should be transferred in a tar ball.

It is also possible to transfer small output files back via the sandbox, using the directive:

#GQSUB -W stageout=myoutputfile.dat

Again note that only small volumes of data (~20MB) should be transferred this way. Larger volumes should be transferred via a grid SE.

Submitting to Other Sites

By default, jobs are submitted to the Birmingham BlueBEAR Grid Cluster. However, it is possible to submit jobs to other grid sites. In order to do this, the directive:

#GQSUB -q serv07.hep.phy.cam.ac.uk:2119/jobmanager-lcgcondor-calice

must be added to your batch script, specifying a valid Computing Element. In this example, the job will be send to the Cambridge. Note that it is only possible to submit jobs to Computing Elements which support your VO. These CEs can be identified using the command:

lcg-infosites --vo voname ce

VO Software

If your VO manages software installation centrally, it will be installed in a directory specified by the environment variable $VO_XXX_SW_DIR, where XXX is the VO name (for example, $VO_ATLAS_SW_DIR). If software is not managed by the VO, it is assumed that users will manage their own software, staging it in or downloading from the SE as required.

Command Line Arguments

Variables may be "passed" to the batch script via the -v [arg_list] option:

gqsub -v FIRST_VAR=1,SECOND_VAR=blah my_batch_script.sh

Note that variables passed to the batch script in this way are parsed before submission, via a simple string replacement mechanism. This method has only been tested with Bash shell scripts!

Known Limits

The gqsub script submits jobs to an LCG WMS. This can take some time (>20 minutes) to update the status of a job, independent of the job execution time. The gqsub scripts will be updated in the near future to support direct submission to Cream CEs, which have a lag time of a few seconds.

More information about the gqsub tools can be found on the University of Glasgow project pages.

gscp

Although it is possible to transfer small files between the submission and worker nodes via the Grid Sandbox, larger files should really be transferred to the local Storage Element (SE). The gscp command can help with this.

Transferring files to the Storage Element

Files stored on the grid are identified by a Logical File Name (LFN). This can in general take any form, but some VOs prefer to manage their file catalogues according to certain rules. For example, ATLAS users should follow the format lfn:/grid/atlas/users/[user_name]/some_file_name.dat.

Assuming that you have a valid LFN, files may be copied from the either the local system to the SE via the gscp command:

gscp some_local_file.dat lfn:/grid/atlas/users/nobody/my_grid_file.dat

A valid proxy certificate is required before transfers can be completed. The gscp command may also be used to copy files from the SE back onto the local system:

gscp lfn:/grid/atlas/users/nobody/my_grid_file.dat some_local_file.dat

The gscp command is also available on all Grid Worker Nodes at Birmingham (both local cluster and BlueBEAR), so it can be used to copy data from the SE to a WN for the purposes of analysis. Care should be taken though! Users have very limited space in their home areas on WNs, so very large files should be copied into the /tmp directory. Users should also remember that they are responsible for deleting their own files after use!

Removing Files From the Grid

Space on the Storage Element is at a premium, so care should be taken to remove files once they are no longer required. Files may be deleted by using the command:

gscp -d lfn:/grid/atlas/users/nobody/my_grid_file.dat

Known Limits

The gscp command will not let you overwrite files on the grid by default. If the file with the same LFN exists already, the transfer will fail.

-- ChristopherCurtis - 26 May 2010

Topic revision: r4 - 09 Jun 2010 - 21:32:58 - ChristopherCurtis
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback