Using Jupyter Notebooks On The HPC

Overview

The Jupyter Notebook is a web application for creating and sharing documents that contain code, visualizations, and text. It can be used for data science, statistical modeling, machine learning, and more.

Initial Configuration

Before starting this procedure, look at the page on creating generic conda environments.

To begin, you'll need to load the most recent Anaconda3 module to get all of the python tools:

[barney@hpc ~]$ module load Anaconda3/2023.07

Now we create a directory for jupyter to use for some of it's internal stuff:

[barney@hpc ~]$ mkdir -p ~/.runtime
[barney@hpc ~]$ export XDG_RUNTIME_DIR="~/.runtime"

Then, create a new conda environment. In this case, we'll call it myJupyter:

[barney@hpc ~]$ conda create --name myJupyter jupyter
[barney@hpc ~]$ conda activate myJupyter

If you need to add additional python packages, install them now. For instance,

[barney@hpc ~]$ conda install -n myJupyter humanize

will install the humanize package, which is useful for converting large numbers (like large file sizes) to more human-friendly values.

Using ksu-jupyter-notebook

To start a remote Jupyter Notebook on one of the compute nodes, run the following command:

[barney@hpc ~]$ ksu-jupyter-notebook

This will start a Jupyter Notebook job on 1 node, with 1 processor for 2 hours. If more resources are required for the job, use the standard qsub options to request them, such as:

-l is used to request resources such as nodes, ppn, walltime, mem, gpus, etc.
-N is used to set a custom name for the job (by default, jobs started with ksu-jupyter-notebook are called JupyterNotebook).
-j is used to change how the output and error files are joined together.
-q is used to select a different queue to run the job in, for instance -q gpuq will run the job in the GPU queue.

For example, to run a 3-hour Jupyter Notebook job, with access to 1 node, 16 cores, and 1 GPU, for 4 hours, we could run the following command:

[barney@hpc ~]$ ksu-jupyter-notebook -l nodes=1:ppn=16:gpus=1,walltime=4:00:00 -q gpuq

By default, the ksu-jupyter-notebook command tries to use a conda environment named myJupyter. You can specify a different conda environment to use, add the --condaenv <CONDA_ENVIRONMENT> option when starting your job.

Note

<CONDA_ENVIRONMENT> should be the name of the conda environment you want to use. For instance, to use the conda environment we created above, you would use the following command:

[barney@hpc ~]$ ksu-jupyter-notebook --condaenv myJupyter

You can also specify the Anaconda environment module that you want to use by adding the --anaconda <ANACONDA_MODULE> option when starting your job.

Note

<ANACONDA_MODULE> should match the full name of the Anaconda environment module to be used (e.g. Anaconda3/2023.07).

The full list of available Anaconda environment modules can be seen by running:

[barney@hpc ~]$ module avail

Once you run the command, it will output something similar to the following:

ksu-jupyter-notebook Output — Example of running ksu-jupyter-notebook command

To connect to your Jupyter Notebook, you need to establish port-forwarding through your current SSH session:

Warning

Most third-party applications, such as PuTTY, do not recognize the SSH escape character ~. If you are using Windows, consider using PowerShell with OpenSSH (default on Windows 10) or MobaXterm, as other terminal applications have different methods of establishing port-forwarding that don't match the instructions below.

Type in the SSH escape sequence: ~+Shift+C (press ~ and then hold Shift and press C). This should open an SSH console to modify your current session.
A prompt displaying ssh> will appear on a new line when the key sequence is successfully entered.

Tip

For the SSH session to recognize the escape character, it MUST be the first character typed on a new line. If you see the ~ chracter appear when you start to type, delete it, hit Enter to start a new line, and try again.

Important

Newer versions of OpenSSH have disabled the escape sequence by default. You'll know if this is the case for your machine if you type in the escape sequence and instead of the ssh> prompt you get a message that says commandline disabled.

To fix this issue, you need to add the -o EnableEscapeCommandline=yes option to ssh command you use to connect to the HPC. Something like:

[barney@mylaptop ~]$ ssh -o EnableEscapeCommandline=yes NetID@hpc.kennesaw.edu

If you want to make the change permanent (so you don't have to remember to type the extra option every time) you can set the same option in the ssh config file on your computer. (It should be ${HOME}/.ssh/config on Mac and Linux systems, and $HOME\.ssh\config in Windows) Just edit that file with your favorite editor and add a line that looks like this:

EnableEscapeCommandline yes

Caution

If you encounter an error that the port is already in use and forwarding failed, you are already forwarding that port to your local machine. To fix this issue, cancel the existing port-forwarding by opening a new SSH interface (~+Shift+C) and then enter -KL<PORT>, where <PORT> is the port number you wish to clear.

Copy and paste the purple text that begins with -L into the SSH prompt and hit Enter to start port forwarding.

The port and compute node combination are unique to your job, so make sure that you use the values provided by the script.
The prompt will display "Forwarding port." if it is successful. To return to your normal shell prompt, hit Enter once more.

Now, open a web browser on your local machine and copy and paste the green URL and token into the address bar.

The URL and token are unique to your job, so make sure that you copy the correct link from the green text.

Once connected to your Jupyter Notebook, you can start a new kernel or open an existing notebook.

Congratulations! You are now running a Jupyter Notebook on the cluster.

Finishing Up

Once you are finished, press the Quit button at the top of the Notebook to quit the running kernel (if it's available), then close your browser and log off of the system. If you would like to clean up further, you can delete the job with the following command:

[barney@hpc ~]$ qdel <JOBID>

Where <JOBID> is the jobid of the Jupyter Notebook job.

Acknowledgments

Some of this information was adapted from the Georgia Institute of Technology's Partnership for an Advanced Computing Environment documentation for Running Jupyter Notebooks Interactively.