Simplified Submission
A simplified grid submission system, based on the work conducted at the
University of Glasgow.
qgsub
The University of Glasgow have spent some time developing wrapper scripts for grid job submission. These scripts have been designed to mimic the familiar torque tools
qsub
and
qstat
. These tools have been adapted for use at Birmingham.
Submitting a simple job
This example assumes that you have already applied for, and installed, a valid grid certificate. It also assumes that you are member of a supported Virtual Organisation.
This example will submit the following simple batch script to the grid. Note that this script could just as easily be submitted to a local torque cluster.
######################
# A simple batch script
######################
#!/bin/bash
#PBS -l cput=00:30:00
date
pwd
######################
# End of script
######################
Note that this script specifies a maximum CPU time limit of 30 minutes (
PBS -l cput=00:30:00
). Including this directive is a requirement of the
gqsub
script - jobs will cannot be submitted to the grid without specifying a maximum CPU time.
Start by setting up a Grid User Interface. This can be achieved on the Particle Physics system by simply opening an SL4/5 terminal window and typing the command
source lcguisetup
. On
BlueBEAR this can be achieved by using the command
source /apps/hep/lcgui/lcguisetup
.
Before submitting your very first job, you should setup a valid grid proxy for your VO with the command
voms-proxy-init --voms _your_vo_
. The gqsub command will normally take care of proxies for you by using the default VO specified in
~/.gqsubrc
. If this file doesn't exist, it will be created, taking the default VO as being the first VO listed in any existing proxy.
A job can be submitted using the command
gqsub _script_name_
. The gqsub command will package up your job, create a JDL file automatically, and submit your job to the grid. Note that you may be asked for your grid certificate password if it detects that your proxy does not have sufficient time left.
Once the job has been submitted, you can check on it's progress using the
gqstat
command. This will print out details of the status of all submitted jobs. Once a job has completed, the
stdout
and
stderr
files will be retrieved automatically by
gqstat
, and deposited into a directory with a random string name (eg
cjc_asqhknFN243£52
).
Using the Sandbox
Sometimes it's desirable to send files with a grid job to a worker node. Providing that the total size of the files is small (~20 MB), they may be staged in via the grid sandbox. In order to specify an input file, the directive:
#GQSUB -W stagein=myinputfile.dat
should be added to the batch script. Note that one directive must be added for each file, so if lots of files are required then they should be transferred in a tar ball.
It is also possible to transfer small output files back via the sandbox, using the directive:
#GQSUB -W stageout=myoutputfile.dat
Again note that only small volumes of data (~20MB) should be transferred this way. Larger volumes should be transferred via a grid SE.
Submitting to Other Sites
By default, jobs are submitted to the Birmingham
BlueBEAR Grid Cluster. However, it is possible to submit jobs to other grid sites. In order to do this, the directive:
#GQSUB -q serv07.hep.phy.cam.ac.uk:2119/jobmanager-lcgcondor-calice
must be added to your batch script, specifying a valid Computing Element. In this example, the job will be send to the Cambridge. Note that it is only possible to submit jobs to Computing Elements which support your VO. These CEs can be identified using the command:
lcg-infosites --vo voname ce
VO Software
If your VO manages software installation centrally, it will be installed in a directory specified by the environment variable
$VO_XXX_SW_DIR
, where
XXX
is the VO name (for example,
$VO_ATLAS_SW_DIR
). If software is not managed by the VO, it is assumed that users will manage their own software, staging it in or downloading from the SE as required.
Command Line Arguments
Variables may be "passed" to the batch script via the
-v [arg_list]
option:
gqsub -v FIRST_VAR=1,SECOND_VAR=blah my_batch_script.sh
Note that variables passed to the batch script in this way are parsed before submission, via a simple string replacement mechanism.
This method has only been tested with Bash shell scripts!
Known Limits
The gqsub script submits jobs to an LCG WMS. This can take some time (>20 minutes) to update the status of a job, independent of the job execution time. The gqsub scripts will be updated in the near future to support direct submission to Cream CEs, which have a lag time of a few seconds.
More information about the
gqsub
tools can be found on the
University of Glasgow project pages.
gscp
Although it is possible to transfer small files between the submission and worker nodes via the Grid Sandbox, larger files should really be transferred to the local Storage Element (SE). The
gscp
command can help with this.
Transferring files to the Storage Element
Files stored on the grid are identified by a
Logical File Name (LFN). This can in general take any form, but some VOs prefer to manage their file catalogues according to certain rules. For example, ATLAS users should follow the format
lfn:/grid/atlas/users/[user_name]/some_file_name.dat
.
Assuming that you have a valid LFN, files may be copied from the either the local system to the SE via the gscp command:
gscp some_local_file.dat lfn:/grid/atlas/users/nobody/my_grid_file.dat
A valid proxy certificate is required before transfers can be completed. The
gscp
command may also be used to copy files from the SE back onto the local system:
gscp lfn:/grid/atlas/users/nobody/my_grid_file.dat some_local_file.dat
The
gscp
command is also available on all Grid Worker Nodes at Birmingham (both local cluster and
BlueBEAR), so it can be used to copy data from the SE to a WN for the purposes of analysis.
Care should be taken though! Users have very limited space in their home areas on WNs, so very large files should be copied into the /tmp
directory. Users should also remember that they are responsible for deleting their own files after use!
Removing Files From the Grid
Space on the Storage Element is at a premium, so care should be taken to remove files once they are no longer required. Files may be deleted by using the command:
gscp -d lfn:/grid/atlas/users/nobody/my_grid_file.dat
Known Limits
The
gscp
command will not let you overwrite files on the grid by default. If the file with the same LFN exists already, the transfer will fail.
--
ChristopherCurtis - 26 May 2010