New ARC Cluster : Huckleberry

ARC released a new cluster named Huckleberry in late 2017. The Huckleberry system, accessed at huckleberry1.arc.vt.edu, was installed with deep learning applications in mind. To this end, it consists of 14 IBM “Minsky” S822LC nodes and NVIDIA’s proprietary NVLink interconnect network. This system enables highly parallel and highly distributed workloads. IBM unveiled its deep learning AI toolkit called PowerAI alongside the launch of Minsky nodes that leverage CPUs linked to Power CPUs with NVLink making it possible to have high speed high performance computing. PowerAI is available under /opt/DL in Huckleberry.

Each compute node on Huckleberry (i.e. IBM “Minsky” nodes) consists of :

  • Two IBM Power8 with 10 cores, 8 threads per core and memory bandwidth 115gb/s per socket
  • Four NVIDIA P100 GPUs advertised to have 21 teraFLOPS of 16-bit floating-point performance ideal for deep learning applications deliver high performance, massive parallelism
  • NVIDIA’s NVLink technology which provides high bandwidth data transfers between CPUs and GPUs; an improvement over PCI-Express
  • Mellanox EDR Infiniband (100 GB/s) interconnect used to connect compute nodes

The PowerAI toolkit contains Caffe, TensorFlow etc. which are optimized for the Power servers. IBM provides support for it as well.

While the rest of the clusters make use of the PBS batch systems, Huckleberry makes use of the Slurm batch system using the command sbatch.

Individuals may request a Huckleberry account.  Instructors can get set up class accounts for Huckleberry as well.