Torque: a quick introduction

Torque provides a management system for job processing. It is heavily based on the Portable Batch System (PBS) and formerly the Network Queueing System (NQS). You will come across the PBS acronym from time to time in documentation such as this.

Jobs are submitted to job queues, and are then scheduled to run immediately or at some later time, depending on how busy the overall system is. Users submit the jobs, but Torque decides when a job starts, chooses which worker node the job runs on, makes sure the job doesn't overstay its requested time, and manages the return of output files to the job submitter.

You the user provide a script, consisting of commands to be processed. This might just contain a single command, which might be the name of another script, perhaps with some options or parameters, or the name of a pre-compiled binary file to be run. Or the script might contain a mixture of control statements and commands. The script is submitted using the qsub command.

Simple job submission examples

Here's a simple example of job submission:

         qsub myjob
where myjob is a file you prepared earlier with a simple text-mode editor, containing the following lines:
         echo Hello World
echo See you later
That's worth giving a try as a first job. But here's a perhaps more likely content of your file myjob:
         #PBS -l cput=7200,walltime=7200
#PBS -j oe
gcc -o mybin mysource.c

The first two lines here set job options, which could alternatively been supplied as options on the qsub command. The first line sets a resource limit (with the -l option, lowercase L) of 7200 seconds of processor time and of wall (elapsed) time. The second line requests that command errors are merged with the standard output in a single file. The third line changes the current directory in a job from the home directory to the one current when the job was submitted. The next line compiles a C source with the gcc command, to produce a binary file called mybin. The last line runs that binary file.

When the qsub command is typed in, it replies with a jobid, containing the number of the job. As soon as the job has finished running, you will find a new file in the directory, or two files if you didn't request merged errors, with name(s) by default based on the jobid. You can look at that output, using the cat or more or less command (for more or less use the space-bar to scroll through the file, and q to quit), for example:

         less myjob.o3243

For more qsub options, see the manual pages for qsub, by entering the command man qsub.

Checking the status of jobs

To check the status of your jobs, use the qstat command.  It can be used with various options. A qstat command on its own summarises jobs, one line per job. With the -a option (qstat -a), a similar summary is shown in an alternative output, giving some info on requested resources. With the -f option (qstat -f), it shows extensive details for those jobs, or one particular job if you add its jobid.

Jobs can be in a Queued status, or Running. For other states and options consult the manual pages for qstat, by entering the command man qstat.