Using Jupyter Notebooks On The HPC
Overview
The Jupyter Notebook is a web application for creating and sharing documents that contain code, visualizations, and text. It can be used for data science, statistical modeling, machine learning, and more.
Initial Configuration
Before starting this procedure, look at the page on creating generic conda environments.
To begin, you'll need to load the most recent Anaconda3 module to get all of the python tools:
[barney@hpc ~]$ module load Anaconda3/2023.07
Now we create a directory for jupyter to use for some of it's internal stuff:
[barney@hpc ~]$ mkdir -p ~/.runtime
[barney@hpc ~]$ export XDG_RUNTIME_DIR="~/.runtime"
Then, create a new conda environment. In this case, we'll call it myJupyter
:
[barney@hpc ~]$ conda create --name myJupyter jupyter
[barney@hpc ~]$ conda activate myJupyter
If you need to add additional python packages, install them now. For instance,
[barney@hpc ~]$ conda install -n myJupyter humanize
will install the humanize
package, which is useful for converting large numbers (like large file sizes) to more human-friendly values.
Using ksu-jupyter-notebook
To start a remote Jupyter Notebook on one of the compute nodes, run the following command:
[barney@hpc ~]$ ksu-jupyter-notebook
This will start a Jupyter Notebook job on 1 node, with 1
processor for 2 hours. If more resources are required for
the job, use the standard qsub
options to request them,
such as:
-l
is used to request resources such asnodes
,ppn
,walltime
,mem
,gpus
, etc.-N
is used to set a custom name for the job (by default, jobs started with ksu-jupyter-notebook are calledJupyterNotebook
).-j
is used to change how the output and error files are joined together.-q
is used to select a different queue to run the job in, for instance-q gpuq
will run the job in the GPU queue.
For example, to run a 3-hour Jupyter Notebook job, with access to 1 node, 16 cores, and 1 GPU, for 4 hours, we could run the following command:
[barney@hpc ~]$ ksu-jupyter-notebook -l nodes=1:ppn=16:gpus=1,walltime=4:00:00 -q gpuq
By default, the ksu-jupyter-notebook
command tries to
use a conda environment named myJupyter. You can specify
a different conda environment to use, add the --condaenv
<CONDA_ENVIRONMENT>
option when starting your job.
Note
<CONDA_ENVIRONMENT> should be the name of the conda environment you want to use. For instance, to use the conda environment we created above, you would use the following command:
[barney@hpc ~]$ ksu-jupyter-notebook --condaenv myJupyter
You can also specify the Anaconda environment module that
you want to use by adding the --anaconda <ANACONDA_MODULE>
option when starting your job.
Note
<ANACONDA_MODULE> should match the full name of the Anaconda environment module to be used (e.g. Anaconda3/2023.07).
The full list of available Anaconda environment modules can be seen by running:
[barney@hpc ~]$ module avail
Once you run the command, it will output something similar to the following:
To connect to your Jupyter Notebook, you need to establish port-forwarding through your current SSH session:
Warning
Most third-party applications, such as PuTTY, do not recognize the SSH escape character ~. If you are using Windows, consider using PowerShell with OpenSSH (default on Windows 10) or MobaXterm, as other terminal applications have different methods of establishing port-forwarding that don't match the instructions below.
- Type in the SSH escape sequence: ~+Shift+C (press ~ and then hold Shift and press C). This should open an SSH console to modify your current session.
- A prompt displaying
ssh>
will appear on a new line when the key sequence is successfully entered.
Tip
For the SSH session to recognize the escape character, it
MUST be the first character typed on a new line. If you
see the ~
chracter appear when you start to type,
delete it, hit Enter to start a new line, and try again.
Important
Newer versions of OpenSSH have disabled the escape sequence
by default. You'll know if this is the case for your machine
if you type in the escape sequence and instead of the ssh>
prompt you get a message that says commandline disabled
.
To fix this issue, you need to add the -o
EnableEscapeCommandline=yes
option to ssh command you use
to connect to the HPC. Something like:
[barney@mylaptop ~]$ ssh -o EnableEscapeCommandline=yes NetID@hpc.kennesaw.edu
If you want to make the change permanent (so you don't have
to remember to type the extra option every time) you can
set the same option in the ssh config file on your computer.
(It should be ${HOME}/.ssh/config
on Mac and Linux systems,
and $HOME\.ssh\config
in Windows) Just edit that file
with your favorite editor and add a line that looks like this:
EnableEscapeCommandline yes
Caution
If you encounter an error that the port is already in use
and forwarding failed, you are already forwarding that
port to your local machine. To fix this issue, cancel the
existing port-forwarding by opening a new SSH interface
(~+Shift+C) and then enter -KL<PORT>
, where
<PORT> is the port number you wish to clear.
Copy and paste the purple text that begins with -L
into the
SSH prompt and hit Enter to start port forwarding.
- The port and compute node combination are unique to your job, so make sure that you use the values provided by the script.
- The prompt will display "Forwarding port." if it is successful. To return to your normal shell prompt, hit Enter once more.
Now, open a web browser on your local machine and copy and paste the green URL and token into the address bar.
- The URL and token are unique to your job, so make sure that you copy the correct link from the green text.
Once connected to your Jupyter Notebook, you can start a new kernel or open an existing notebook.
Congratulations! You are now running a Jupyter Notebook on the cluster.
Finishing Up
Once you are finished, press the Quit
button at the top of
the Notebook to quit the running kernel (if it's available),
then close your browser and log off of the system. If you
would like to clean up further, you can delete the job with
the following command:
[barney@hpc ~]$ qdel <JOBID>
Where <JOBID>
is the jobid of the Jupyter Notebook job.
Acknowledgments
Some of this information was adapted from the Georgia Institute of Technology's Partnership for an Advanced Computing Environment documentation for Running Jupyter Notebooks Interactively.