Huckleberry User Guide

Huckleberry is a high performance computing system targeted at deep learning applications.  Huckleberry consists of two login nodes and Fourteen IBM “Minksy” S822LC compute nodes. Each of the compute nodes is equipped with:

  • Two IBM Power8 CPU (3.26 GHz) with 256 GB of memory
  • Four NVIDIA P100 GPU with 16 GB of memory each
  • NVLink interfaces connecting CPU and GPU memory spaces
  • Mellanox EDR Infiniband (100 GB/s) interconnect
  • CentOS 7 OS

Understanding non-uniform memory access (NUMA) patterns important to get the full benefit of the S822LC compute nodes on huckleberry.  The memory bandwidth associated with data movement within each compute node is summarized in Figure 1.  Note that each Power8 CPU is coupled to two P100 GPU through NVLink, which supports bi-directional data transfer rates of 80 GB/s.  The theoretical  maximum memory bandwidth for each Power8 CPU is 115 GB/s.  The theoretical maximum memory bandwidth for each NVIDIA P100 GPU is 720 GB/s

S822LCforHPCDiagram
Figure 1. Theoretical memory bandwidth for data transfers within the IBM S822LC Compute node (image source: NVIDIA).

To access Huckleberry, users should log in with:

ssh huckleberry1.arc.vt.edu

Slurm has been installed on Huckleberry and supports the scheduling of both batch job and interactive jobs.

Basic Job Submission and Monitoring

The current configuration is currently very basic, but allows users to run jobs either through the batch scheduler or interactively. The following is a basic “hello world” job submission script requesting 500 GB memory and all four Pascal P100 GPU on a compute node.

NOTE: asking for -N 1 without specifying how many cores per node will default to only 1 core (equivalent to -n 1). If you would like to get the full node exclusively, you should ask for all the cores on the node using the flag -n, or, you could use the --exclusive flag


#!/bin/bash
#SBATCH -J hello-world
#SBATCH -p normal_q

#SBATCH -p normal_q
#SBATCH -N 1  # this will not assign the node exclusively. See the note above for details
#SBATCH -t 10:00
#SBATCH --mem=500G
#SBATCH --gres=gpu:4
#SBATCH --account=(YOUR ALLOCATION ID)
echo "hello world"

To submit a job to the batch queue, slurm provides the sbatch command (which is the analog of torque’s qsub). Assuming that the above is copied into a file “hello.sh,” a job can be submitted to the scheduler using


mcclurej@hulogin1:~/Slurm$ sbatch hello.sh
Submitted batch job 5

To check on the status of jobs, use the squeue command,


mcclurej@hulogin1:~/Slurm$ squeue -u mcclurej
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5 debug hello-wo mcclurej R INVALID 1 hu001

Output from the job will be written to the file slurm-5.out

To cancel a job, provide the jobid to the scancel command.

Slurm provides the srun command to launch parallel jobs. Typically this would replace mpirun for an MPI job.

For more comprehensive information, SchedMD has a handy Slurm command cheat sheet

Interactive Jobs

To run a job interactively, a two-step process is required. First, one should request a reservation using salloc (e.g. one compute node for 10 minutes). If you would like to get exclusive access to the node you should specificy the number of cores or use the exclusive flag as noted in the previous section.

salloc -N 1 -n 8 --gres=gpu:1 --partition=normal_q --account=(YOUR ALLOCATION ID)

To get an interactive job on a session, provide the --pty /bin/bash flag to srun,

srun --pty /bin/bash

This command won’t work if you don’t first request an allocation to reserve nodes for this purpose.

Requesting Individual GPU 

In many cases jobs will require fewer than the four GPU available on each huckleberry compute node.  GPU can be requested as a generic resource (GRES) through Slurm by requesting a specific number of processor cores and GPU.  To request one processor core and one GPU in an interactive session with 8 GB of memory per processor core,

salloc -N1 -n8 -t 10:00 --mem-per-cpu=8G --gres=gpu:1

The example batch submission script shown below request the equivalent resource for a batch job


#!/bin/bash
#SBATCH -J gpu-alloc
#SBATCH -p normal_q
#SBATCH -n 8
#SBATCH -t 10:00
#SBATCH --mem-per-cpu=8G
#SBATCH --gres=gpu:1
#SBATCH --account=(YOUR ALLOCATION ID)
echo "Allocated GPU with ID $CUDA_VISIBLE_DEVICES"

Slurm will set the $CUDA_VISIBLE_DEVICES environment variable automatically based on your request.  Multiple processor cores and/or GPU can be requested in the same manner.  For example, to request two GPU and 20 CPU cores,

salloc -n20 -t 10:00 --mem-per-cpu=4G --gres=gpu:2

The Power8 CPU are viewed by Slurm as 20 processor cores.

Software

Software modules are available on huckleberry and function in the same manner as other ARC systems, e.g. the following syntax will load the module for cuda


module load cuda

Additionally, IBM’s PowerAI deep learning software are installed under within the Anaconda3 module.  For brief tutorials, please click on any of the package name below.

For additional information, please refer to the PowerAI User Guide.

Python

For users that would like to customize their Python environment, we provide online documentation for best practices to manage Python on ARC systems. For more detailed usages, please refer to part below. 

PowerAI Installation & Usage (Updated in April 2019)

All testing(on TF, Pytorch, Keras(TF backend), Caffe) has been performed with python/3.6 on Huckleberry GPU nodes, you could see testing demonstrations and example python scripts from this shared Google Drive Folder 

Part 1.  PowerAI Library Usage (PREFERRED)
# step 1: request for GPU nodes
# salloc --partition=normal_q --nodes=1 --tasks-per-node=10 --gres=gpu:1 bash
# step 2: load all necessary modules
module load gcc cuda Anaconda3 jdk
# step 3: activate the virtual environment
source activate powerai16_ibm
# step 4: test with simple code examples, Google drive above
python test_pytorch.py
python test_TF_multiGPUs.py
python test_keras.py
# step 5: for new packages(take beautifulsoup4 for example)
pip install --user beautifulsoup4 # on hulogin1/hulogin2
# or pip install --user --no-deps keras
Part 2. Installation

First make sure you are in hulogin1/hulogin2

module load gcc cuda Anaconda3 jdk
java -version
conda create -n powerai36 python==3.6 # create a virtual environment
source activate powerai36 # activate virtual environment
conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
# if things don't work, add two channels and run commands showing below
conda config --add default_channels https://repo.anaconda.com/pkgs/main
conda config --add default_channels https://repo.anaconda.com/pkgs/r
# install ibm powerai meta-package via conda
conda install powerai
# keep type 'enter' and then enter 1 for license acceptance
export IBM_POWERAI_LICENSE_ACCEPT=yes
# you will need to update the jupyter package 
conda install jupyter notebook

Please feel free to contact us if you have seen issues or have special requirements over using ML/DL/Simu/Vis packages on Huckleberry.