Introduction To TORQUE
Work In Progress
This article is a work in progress. Chances are, information is either incomplete or just plain missing.
Overview
TORQUE is a Distributed Resource Manager for submitting and controlling jobs on Kennesaw's HPC cluster. TORQUE manages jobs that users submit to various queues on a computer system, each representing a resource group with attributes necessary for the queue's jobs.
The table below shows some of the most commonly used TORQUE commands:
Command | Description |
---|---|
qsub |
Submit a job for processing. |
qstat |
Monitor the status of a job. |
qdel |
Terminate a job before its completion. |
TORQUE includes numerous directives that specify resource
requirements and other attributes for batch and interactive jobs. TORQUE
directives can appear as header lines (lines that start with #PBS
) in
a batch job script or as command-line options to the qsub
command.
TORQUE is based on the original open-source Portable Batch System (OpenPBS) project and was managed as an open-source project by Adaptive Computing, Inc. in cooperation with the TORQUE community. It is offered as a commercial product sold separately and as part of Moab Workload Manager.
For help using TORQUE to submit and manage jobs, see the Submitting and managing jobs chapter of Adaptive Computing's TORQUE Administrator Guide. For a list of TORQUE commands, see the Commands overview appendix.
Job Scripts
To run a job in batch mode on a high-performance computing system using
TORQUE, prepare a job script that specifies the application you
want to run and the resources required to run it, and then submit the
script to TORQUE using the qsub
command. TORQUE passes your job and
its requirements to the system's job scheduler, dispatching
your job whenever the required resources are available.
A basic job script might contain just a bash
or tcsh
shell
script. However, TORQUE job scripts most commonly have at least one
executable command preceded by a list of directives that specify
resources and other attributes needed to execute the command (for
example, wall-clock time, the number of nodes and processors, and
filenames for job output and errors). These directives are listed in
header lines (lines beginning with #PBS
), which should precede any
executable lines in your job script.
Additionally, your TORQUE job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.
Serial Job Example
A TORQUE job script for a serial job might look like this:
#!/bin/bash
#PBS -k o
#PBS -l nodes=1:ppn=1,walltime=30:00
#PBS -M barney@kennesaw.edu
#PBS -m abe
#PBS -N JobName
#PBS -j oe
#PBS -q batch
./a.out
In the above example, the first line indicates the script should be read using the bash command interpreter. Then, several header lines of TORQUE directives are included:
TORQUE directive | Description |
---|---|
#PBS -k o |
Keeps the job output |
#PBS -l nodes=1:ppn=1,walltime=30:00 |
Indicates the job requires one node, one processor per node, and 30 minutes of wall-clock time |
#PBS -M barney@kennesaw.edu |
Sends job-related email to barney@kennesaw.edu |
#PBS -m abe |
Sends email if the job is (a ) aborted, when it (b ) begins, and when it (e ) ends |
#PBS -N JobName |
Names the job JobName |
#PBS -j oe |
Joins standard output and standard error |
#PBS -q batch |
Uses the job queue batch |
The last line tells the operating system to execute a.out
(using a
single processor).
MPI Job Example
A TORQUE job script for an MPI job might look like this:
#!/bin/bash
#PBS -k o
#PBS -l nodes=2:ppn=6,walltime=30:00
#PBS -M barney@kennesaw.edu
#PBS -m abe
#PBS -N JobName
#PBS -j oe
#PBS -q batch
mpiexec -np 12 -machinefile $PBS_NODEFILE ~/bin/binaryname
As in the previous example, this script starts with a line that
specifies the bash
command interpreter, followed by several header
lines of TORQUE directives:
TORQUE directive | Description |
---|---|
#PBS -k o |
Keeps the job output |
#PBS -l nodes=2:ppn=6,walltime=30:00 |
Indicates the job requires two nodes, six processors per node, and 30 minutes of wall-clock time |
#PBS -M barney@kennesaw.edu |
Sends job-related email to barney@kennesaw.edu |
#PBS -m abe |
Sends email if the job is (a ) aborted, when it (b ) begins, and when it (e ) ends |
#PBS -N JobName |
Names the job JobName |
#PBS -j oe |
Joins standard output and standard error |
#PBS -q batch |
Uses the job queue batch |
The last line in the example is the executable line. It tells the
operating system to use the mpiexec
command to execute the
~/bin/binaryname
binary on 12 processors from the machines listed in
$PBS_NODEFILE
.
For more about TORQUE directives, see the qsub
manual page (enter
man qsub
).
GPU Job Example
A TORQUE job script for a serial job that requires a GPU might look like this:
#!/bin/bash
#PBS -k o
#PBS -l nodes=1:ppn=1:gpus=1,walltime=30:00
#PBS -M barney@kennesaw.edu
#PBS -m abe
#PBS -N JobName
#PBS -j oe
#PBS -q gpuq
module load CUDA
./a.out
In the above example, the first line indicates the script should be read using the bash command interpreter. Then, several header lines of TORQUE directives are included:
TORQUE directive | Description |
---|---|
#PBS -k o |
Keeps the job output |
#PBS -l nodes=1:ppn=1:gpus=1,walltime=30:00 |
Indicates the job requires one node, one processor per node, one gpu per node, and 30 minutes of wall-clock time |
#PBS -M barney@kennesaw.edu |
Sends job-related email to barney@kennesaw.edu |
#PBS -m abe |
Sends email if the job is (a ) aborted, when it (b ) begins, and when it (e ) ends |
#PBS -N JobName |
Names the job JobName |
#PBS -j oe |
Joins standard output and standard error |
#PBS -q gpuq |
Uses the job queue gpuq |
The second to last line loads the environment module for CUDA,
a common library for using the GPUs, and almost universally required for
any program that will use one. The last line tells the operating
system to execute a.out
(using a single processor). The main
differences are:
- the
gpus=1
addition in the-l
directive, - using the
gpuq
queue instead of thebatch
queue, - and loading the CUDA environment module.
An MPI job that required GPUs would need to make similar changes as well.
Submitting Jobs
To submit your job script (for example, job.script
), use the TORQUE
qsub
command. If the command runs successfully, it will return a job
ID to standard output, for example:
[barney@hpc ~]$ qsub job.script
123456.roland
Suppose your job requires attribute values greater than the
defaults, but less than the maximum allowed. In that case,
you can specify these with the -l
(lowercase L
, for
"limit") option, either in your job script (as explained
in the previous section) or on the qsub
command line. For
example, the following command submits job.script
, using the
-l walltime
option to indicate the job needs more than the
default 30 minutes of wall-clock time:
[barney@hpc ~]$ qsub -l walltime=10:00:00 job.script
123457.roland
Note
Command-line options will override TORQUE directives in your job script.
To include multiple options on the command line, use either one -l
flag with several comma-separated options, or multiple -l
flags, each
separated by a space. For example, the following two commands are
equivalent:
[barney@hpc ~]$ qsub -l ncpus=16,mem=1024mb job.script
[barney@hpc ~]$ qsub -l ncpus=16 -l mem=1024mb job.script
Useful qsub
options include:
qsub option |
Description |
---|---|
-q queue_name |
Specifies a user-selectable queue (queue_name) |
-r |
Makes the job re-runnable |
-a date_time |
Executes the job only after a specific date and time (date_time) |
-V |
Exports environment variables in your current environment to the job |
-I |
Makes the job run interactively (usually for testing purposes) |
For more, see the qsub
manual page (enter man qsub
).
Monitoring Jobs
To monitor the status of a queued or running job, use the qstat
command.
Useful qstat
options include:
qstat option |
Description |
---|---|
-u user_list |
Displays jobs for users listed in user_list |
-a |
Displays all jobs |
-r |
Displays running jobs |
-f |
Displays the full listing of jobs (returns excessive detail) |
-n |
Displays nodes allocated to jobs |
For example, to see all the jobs running in the LONG queue, enter:
[barney@hpc ~]$ qstat -r long | less
For more, see the qstat
manual page (enter man qstat
).
Alternatively, use the Moab showq
command for monitoring jobs. To list
the queued jobs in dispatch order, enter:
[barney@hpc ~]$ showq -i
For more, see Common Moab scheduler commands and the showq
manual page
(enter man showq
).
Deleting Jobs
To delete queued or running jobs, use the qdel
command:
- To delete a specific job (
jobid
), enter:[barney@hpc ~]$ qdel jobid
- To delete all jobs, enter:
[barney@hpc ~]$ qdel all
Occasionally, a node becomes unresponsive and won't respond to the
TORQUE server's requests to delete a job. If that occurs, add the -W
(uppercase W) option:
[barney@hpc ~]$ qdel -W jobid
Email the High-Performance Computing group for help if that doesn't work.
For more, see the qdel
manual page (enter man qdel
).
Acknowledgements
This document is heavily borrowed from the Indiana University Knowledge Base article Use TORQUE to submit and manage jobs on high-performance computing systems.