Cascades is a 196-node system capable of tackling the full spectrum of computational workloads, from problems requiring hundreds of compute cores to data-intensive problems requiring large amount of memory and storage resources. Cascade contains three compute engines designed for distinct workloads.
- General – Distributed, scalable workloads. With Intel’s latest-generation Broadwell processors, 2 16-core processors and 128 GB of memory on each node, this 190-node compute engine is suitable for traditional HPC jobs and large codes using MPI.
- GPU – Data visualization and code acceleration! There are four nodes in this compute engine which have – two Nvidia K80 GPUs, 512 GB of memory, and one 2 TB NVMe PCIe flash card.
- Very Large Memory – Graph analytics and very large datasets. With 3TB (3072 gigabytes) of memory, four 18-core processors and 6 1.8TB direct attached SAS hard drives, 400 GB SAS SSD drive, and one 2 TB NVMe PCIe flash card , each of these two servers will enable analysis of large highly-connected datasets, in-memory database applications, and speedier solution of other large problems.
The table below lists the specifications for Cascade’s compute engines.
|COMPUTE ENGINE||#||HOSTS||CPU||CORES||MEMORY||LOCAL STORAGE||OTHER FEATURES|
|General||190||ca007-ca196||2 x E5-2683v4 2.1GHz (Broadwell)||32||128 GB, 2400 MHz||1.8TB 10K RPM SAS
200 GB SSD
|GPU||4||ca003-ca006||2 x E5-2683v4 2.1GHz (Broadwell)||32||512 GB, 2400 MHz||3.6 TB (2 x 1.8 TB) 10K RPM SAS (RAID 0)
2-400 GB SSD (RAID 1)
2 TB NVMe PCIe
|2-NVIDIA K80 GPU
|Very Large Memory||2||ca001-ca002||4 x E7-8867v4 2.4 GHz (Broadwell)||72||3 TB, 2400 MHz||3.6 TB (2 x 1.8 TB) 10K RPM SAS (RAID 0)
6-400 GB SSD (RAID 1)
2 TB NVMe PCIe
- GPU Notes: There are 4 CUDA Devices. Although the K80s are a single physical device in 1 PCIe slot, there are 2 separate GPU chips inside. They will be shown as 4 separate devices to CUDA code. nvidia-smi will show this.
- All nodes have locally mounted SAS and SSDs.
$TMPDIR) point to the SAS drive and
/scratch-ssdpoints to the SSD on each node. On large memory and GPU nodes, which have multiple of each drive, the storage across the SSDs are combined in
/scratch-ssd(RAID 0) and the SAS drives are mirrored (RAID 1) for redundency.
- 100 Gbps Intel OPA interconnect provides low latency communication between compute nodes for MPI traffic.
Note: Cascades is governed by an allocation manager, meaning that in order to run most jobs on it, you must be an authorized user of an allocation that has been submitted and approved. The open_q queue is available to jobs that are not charged to an allocation, but it has tight usage restrictions (see below for details) limits and so is best used for initial testing in preparing allocation requests. For more on allocations, click here.
Cascades has four queues:
- normal_q for production (research) runs.
- largemem_q for production (research) runs on the large memory nodes.
- dev_q for short testing, debugging, and interactive sessions. dev_q provides slightly elevated job priority to facilitate code development and job testing prior to production runs.
- open_q provides access for small jobs and evaluating system features. open_q does not require an allocation; it can be used by new users or researchers evaluating system performance for an allocation request.
The settings for the queues are:
Max Nodes: 32 per user, 48 per allocation
|Max Jobs||6 per user
12 per allocation
|1 per user||1 per user||1 per user|
|Max Cores||1,024 per user
1,536 per allocation
|72 per user||1,024 per user||128 per user|
|Max Memory||4 TB per user
6 TB per allocation
|3 TB per user||4 TB per user
6 TB per allocation
|1 TB per user|
|Max Walltime||72 hr||72 hr||2 hr||4 hr|
|Max Core-Hours||36,884 per user
55,326 per allocation
|5,184 per user||256 per user||256 per user|
- Shared node access: more than one job can run on a node (Note: This is different from other ARC systems)
For list of software available on Cascades, as well as a comparison of software available on all ARC systems, click here.
Note that a user will have to load the appropriate module(s) in order to use a given software package on the cluster. The module avail and module spider commands can also be used to find software packages available on a given system.
Cascades is access through traditional terminal means.
The cluster is accessed via ssh to one of the two login nodes below. Log in using your username (usually Virginia Tech PID) and password. You will need an SSH Client to log in; see here for information on how to obtain and use an SSH Client.
Access to all compute engines (aside from interactive nodes) is controlled via the job scheduler. See the Job Submission page here. The basic flags are:
#PBS -l walltime=dd:hh:mm:ss #PBS -l [resource request, see below] #PBS -q normal_q (or other queue, see Policies) #PBS -A <yourAllocation> (see Policies) #PBS -W group_list=cascades
Compute nodes are not dedicated to a single job (as is done on BlueRidge). Cascades has more options for requesting resources to ensure the scheduler can optimally place jobs. Resources can be requested by specifying the number of nodes:ppn (like on BlueRidge), but also cores, memory, GPUs, etc. See example resource requests below:
Request 2 nodes with 32 cores each #PBS -l nodes=2:ppn=32 Request 4 cores (on any number of nodes) #PBS -l procs=4 Request 12 cores with 20gb memory per core #PBS -l procs=12,pmem=20gb Request 2 nodes with 32 cores each and 20gb memory per core (will give two 512gb nodes) #PBS -l nodes=2:ppn=32,pmem=20gb Request 2 nodes with 32 cores per node and 1 gpu per node #PBS -l nodes=2:ppn=32:gpus=1 Request 2 cores with 1 gpu each #PBS -l procs=2,gpus=1
This shell script provides a template for submission of jobs on Cascades. The comments in the script include notes about how to request resources, load modules, submit MPI jobs, etc.
To utilize this script template, create your own copy and edit as described here.