Skip to content

Submitting TensorFlow Jobs With Torque

Currently, the best way to use TensorFlow on the HPC is to build your own python virtual environment using conda, which will allow you to install a recent copy of TensorFlow that will work with the GPU nodes.

Setting up a Conda Environment

Follow the instructions at Setting Up A Conda Environment for instructions on how to set up a conda environment for TensorFlow.

TensorFlow Shell Script

To set up a submission script to run the job, you can use the following as a starting point (making sure to change netid@kennesaw.edu in the script to your correct email address):

run_tf2.pbs
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/bash
#PBS -l nodes=1:ppn=28:gpus=1
#PBS -l walltime=100:00:00
#PBS -m abe
#PBS -M netid@kennesaw.edu (1)
#PBS -q gpuq

JOBID=$( echo ${PBS_JOBID} | cut -f1 -d. )

# Load the modules you need
module load Anaconda3/2021.05
eval "$(conda shell.bash hook)"
conda activate myenv

# Change Directory to the working directory
cd ${PBS_O_WORKDIR}

# Run your code:
python3 ${FILE}
  1. 🙋‍♂️ Make sure to change netid@kennesaw.edu on this line to your KSU email address or you won't receive emails when the job starts and stops.

Usage

You can schedule your job in the queue by running the following (let's assume you named your TensorFlow script my-tf2-code.py, and the submission script is named run_tf2.pbs):

[barney@hpc ~]$ qsub -vFILE=${PWD}/my-tf2-code.py ${PWD}/run_tf2.pbs