Storage

Overview

ARC offers several different containers for users’ data:

Name Intent File System Environment Variable Per User Maximum Data Lifespan Available On
Home Long-term storage of files Qumulo $HOME 640 GB
1 million files
Unlimited Login and Compute Nodes
Group Long-term storage of shared, group files GPFS $GROUP 10 TB, 5 million files per faculty researcher Unlimited Login and Compute Nodes
Work Fast I/O, Temporary storage Lustre (BlueRidge)
GPFS (Other clusters)
$WORK 14 TB,
3 million files
120 days Login and Compute Nodes
Archive Long-term storage for infrequently-accessed files CXFS $ARCHIVE Unlimited Login Nodes
Local Scratch Local disk (hard drives) $TMPDIR Size of node hard drive Length of Job Compute Nodes
Memory (tmpfs) Very fast I/O Memory (RAM) $TMPFS Size of node memory Length of Job Compute Nodes

Each is described in the sections that follow.

Home

Home provides long-term storage for system-specific data or files, such as installed programs or compiled executables. Home can be reached the variable $HOME, so if a user wishes to navigate to their Home directory, they can simply type cd $HOME.

Each user is provided a maximum of 640 GB in their Home directories (across all systems). When a user exceeds the soft limit, they are given a grace period after which they can no longer add any files to their Home directory until they are below the soft limit. Home directories are also subject to a 690 GB hard limit; users Home directories are not allowed to exceed this limit. Note that running jobs fail if they try to write to a Home directory after the soft limit grace period is expired or when the hard limit is reached. To check your usage, see Checking Usage.

Group

Group provides long-term storage for shared group/research files. It provides a storage place for collaboration and data exchange within the research group. Each Virginia Tech faculty member can request 10 TB of $GROUP storage at no cost; additional storage may be purchased through the investment computing program.

Work

Work provides users with fast but temporary storage for use during simulations or other research computing applications. Work will provide up to 14 TB of space for a given user, where the limit is applied separately to BlueRidge and to the sum of a user’s Work directories on all other systems. The Work directory is also subject to a limit of 3 million files per user. However, Work storage is regularly purged of files that have not been modified in 120 days or more, so users should move any data that they wish to keep long-term to their Home, Group, or Archive directories. To maximize the speed of memory reads and writes, Work is housed on a parallel file system. BlueRidge uses a dedicated Lustre file system while a shared General Parallel File System (GPFS) is used on the other clusters.

The 14 TB and 3 million file maximums mentioned above are soft limits. When a user exceeds the soft limit, they are given a grace period after which they can no longer add any files to their Work directories until they are again below the soft limit. Work directories are also subject to a 15 TB hard limit and a maximum of 4 million files; users’ Work directories are not allowed to exceed this limit. Note that running jobs fail if they try to write to a Work directory after the soft limit grace period is expired or when the hard limit is reached. To check your usage, see Checking Usage.

To run a job on an ARC system, the user should copy their executable and any other needed files to Work and run the job from that location. (File copies can be done using the cp command.) This will ensure that the job obtains the fastest possible I/O (file read and write) speeds. Once a job is complete, any files that the user wishes to keep should be transferred to Home or Archive.

Work for a given system can be reached via the variable $WORK. So if a user wishes to navigate to Work directory, they can simply type cd $WORK.

Archive

Archive provides users with long-term storage for data that does not need to be frequently accessed i.e. storing important/historical results. Archive is accessible from all ARC’s systems. Archive is housed on a serial Network File System (NFS). Archive is not mounted on compute nodes, so running jobs cannot access files on it.

Archive can be reached the variable $ARCHIVE, so if a user wishes to navigate to their Archive directory, they can simply type cd $ARCHIVE .

Best Practices for archival storage

Because the ARCHIVE filesystem is backed by tape (a high capacity but very high latency medium), it is very inefficient and disruptive to do file operations (especially on lots of small files) on the archive filesystem itself. Archival systems are designed to move and replicate very large files; ideally users will tar all related files into singular, large files. Procedures are below:

To place data in $ARCHIVE:

  1. create a tarball containing the files in your $HOME(or $WORK) directory
  2. copy the tarball to the $ARCHIVE filesystem (use rsync in case the transfer were to fail)

To retrieve data from $ARCHIVE:

1. copy the tarball back to your $HOME(or $WORK) directory (use rsync in case the transfer were to fail).
2. untar the file on the login node in your$ HOME(or $WORK) directory.

Directories can be tarred up in parallel with, for example, gnu parallel (available via the parallel module). This line will create a tarball for each directory more than 180 days old:

find . -maxdepth 1 -type d -mtime +180 | parallel "[[ -e {}.tar.gz ]] || tar -czf {}.tar.gz {}"

The resulting tarballs can then be moved to Archive and directories can then be removed. (Directories can also be removed automatically by providing the --remove-files flag to tar, but this flag should of course be used with caution.)

 

Local Scratch

Running jobs are given a workspace on the local hard drive on each compute node. The path to this space is specified in the $TMPDIR environment variable. This provides another option for users who would prefer to do I/O to local disk (such as for some kinds of big data tasks). Please note that any files in local scratch are removed at the end of a job, so any results or files to be kept after the job ends must be copied to Work or Home.

Memory

Running jobs have access to an in-memory mount on compute nodes via the $TMPFS environment variable. This should provide very fast “read/write” speeds for jobs doing I/O to files that fit in memory (see the system documentation for the amount of memory per node on each system). Please note that these files are removed at the end of a job, so any results or files to be kept after the job ends must be copied to Work or Home .

Checking Usage

As noted above, usage of the Home and Work file systems is limited by a quota system. To check the size of your Home or Work directory, you can use the command quota. In the example output below, the user johndoe has 108 GB in his Home directory and 4 TB in his Work directory. His usage of Home exceeds his soft limit of 100 GB (the ‘LIMIT’ column), but not his hard limit. He will need to remove files from his Home directory to drop below the soft limit.

    [johndoe@brlogin2 ~]$ quota
           USAGE     LIMIT     #FILES
    /home  108.112G  100.000G  335130
    /work  4.062T    14T       46

Example Usage

Typical use of the storage system would look like this:

  1. A user writes code in a programming language, such as Fortran or C, (perhaps using the vi editor) and saves it. (Note that the user could also copy their file(s) from their local computer to the storage system.)
  2. The user compiles the code in their Home directory. Compiled executables are system-specific and therefore should be kept in Home, so that they do not get mixed up with files compiled on other systems.
  3. The user copies the executable, along with any other files that it needs to run, to Work. This transfer might be done using a command like the following:
        cp programfile $WORK/

    Putting the files on Work ensures that the program, once run, obtains the fastest possible I/O (file read and write) performance.

  4. The user runs the job (typically by submitting it to the system’s scheduler – see the “Submitting Jobs” section on each system page). Output files are generated, also on Work.
  5. The user reviews the output files and moves (using the mv command) any that he or she wishes to keep long-term back to their Home directory.
  6. If the user needs access to output file(s) from another machine, he or she would move those file(s) to Archive.