Virginia Tech
Advanced Research Computing
  • Home
  • About ARC@VT
  • Research
  • Services & Support
  • Systems & Resources
    • System X
    • SGI Systems
      •      Accounts
      •      New Accounts
      •      Usage Overview
      •      Help Requests
    • SGI Software
      •      Applications
      •      Queuing System
      •      Compilers
      •      Subroutine Libraries
      •      Debuggers
    • SGI Parallel Programming
      •      Auto-Parallel
      •      SCSL Subroutines
      •      OpenMP
      •      MPI Programming
    • Sun Systems
    • Visualization
  • Application Software
  • Web Site Map

New SGI Accounts

Three SGI Systems are available as components of VT-ARC:

     inferno.arc.vt.edu
     inferno2.arc.vt.edu
     caludron.arc.vt.edu

Inferno2 and Cauldron are accessed via a queuing system via the head nodes charon1.arc.vt.edu or charon2.arc.vt.edu, see details below.

Direct logins to an SGI Altix interactive node, inferno.arc.vt.edu, are available. It is to be used for applications requiring interactive sessions (such as GUIs) and for debug purposes. This is a relatively small server: please do not run large jobs on inferno; try to limit debug jobs on inferno to 2 - 4 processors.

An ssh-2 or later client running on your local system is required to logon to these systems. MAC OS-X, Linux, and most other UNIX systems have a built in ssh client - simply enter the following commands to log onto charon1, charon2, or inferno:

     ssh  charon1.arc.vt.edu
                or
     ssh  charon2.arc.vt.edu
                or
     ssh  inferno.arc.vt.edu
Notes:
  1. The VT ARC systems require use of an ssh-2 or later client; in some Unix implementations, you will be required to use "ssh2" instead of "ssh" above.

  2. If the id you are using on your local system is different from your id on the VT ARC systems, precede the VT ARC hostname with your VT ARC account name followed by the @ sign. For example, if your account name were my_acct, you could log onto inferno using the following command:
       ssh  my_acct@inferno.arc.vt.edu
    
  3. If you are logging on from a wireless or off campus location, you should do so using a VPN connection, see: http://www.computing.vt.edu/internet_and_web/internet_access/vpn.html

  4. If you are using an MS Windows (2000, XP, 2003, or Vista) Desktop, you can download the latest "SSHSecureShell" Client (currently SSHSecureShellClient-3.2.9.exe) from http://ftp.ssh.com/pub/ssh/ or the putty ssh client from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.

Queuing System

Jobs are submitted to the ARC SGI Altix servers through a job queueing system. Submission of jobs through a queueing system means that jobs will not run immediately but will wait for available CPU resources. The queueing system thus keeps the compute servers from being overloaded and makes CPU and memory utilization more optimal across running jobs. This will allow each job to run optimally once it leaves the queue, especially jobs that are CPU bound.

To use the queueing system, you will be logging into one of two SGI Altix head nodes to access files, compile code, and submit jobs to a queue. Direct logins to the SGI Altix compute servers, inferno2 and cauldron, are not allowed. The two head nodes that are available for you to use are:

    charon1.arc.vt.edu
    charon2.arc.vt.edu

The head nodes are small systems with 2 CPUs and 4 GB of memory. Therefore, the running of applications directly on the head nodes is prohibited.

Selecting A Queue

There will be two queues available for job submission: the inferno2_q for running jobs on inferno2, and the cauldron_q for running jobs on cauldron. The inferno2_q will be configured to serve smaller jobs, while the cauldron_q will be configured to serve larger jobs. The number of CPUs and amount of memory you need for a job, as well as your total number of jobs, will guide your selection of a queue.

Here are the characteristics of each queue that will help you select a queue to run on:

The inferno2_q:

  • Offers 4 GB of memory per CPU
  • Has a soft limit* of 2 jobs per user
  • Has a hard limit** of 8 jobs per user
  • Has a MAXIMUM limit of 16 CPUs per USER

The cauldron_q:

  • Ofers 5 GB of memory per CPU
  • Has a soft limit* of 1 job per user
  • Has a hard limit** of 1 job per user
  • Requires a MINIMUM of 10 CPUS per JOB

* Soft limit sets the number of jobs submited by a user that will be able to run concurrently if any other user below that limit has jobs waiting in queue.

** Hard limit sets the maximum number of jobs submited by a user that can run concurrently.

Note:  We will monitor and tune the job and CPU limits as needed in order to maximize utilization.

Once a job is submitted to a queue, it will wait until requested CPU resources are available within that queue, and will then run if eligible according to the job limits listed above. On the cauldron_q, the job limits specify that a user cannot have more than 2 jobs running at once, and a user will only be able to have a second job running if no other user is waiting in the queue with no jobs running. On the inferno2_q, the job limits specify that if user A has 2 4-cpu jobs running, before a 3rd 4-cpu job can run for user A, no other user below the job soft cap can have an eligible job waiting to run. Furthermore, the inferno2_q user cannot have a 4th 4-cpu job running concurrently with their previous 3 jobs because they would then exceed the user CPU limit.

The total amount of time requested for a job also affects its eligibility to run, as shorter jobs tend to get priority over longer jobs. You will learn to tune your requested job times to be more competitive as you gain experience with your code or application.

Submitting Jobs to the Queuing System

The queuing system is Torque/Moab, so if you are familiar with those, it will be very similar.

Job submissions are done by submitting a job launch script with the command qsub. You can find example submissions scripts in the /apps/doc directory.

To submit your job to the queuing system use the command qsub:

     qsub ./JobScript.sh

This will return your job name of the form xxxxx.queue.tcf-int.vt.edu. The number before the .queue.tcf-int.vt.edu is your job_number.

If you need to remove your job from the queue, use qdel:

    qdel <job_number>.

To see status information about your job, you can use:

    showstart <job_number>  will tell you expected start and
                            finish times.
    qstat -f <job_number>   general information about the job.

When your job has finished running any outputs to stdout or stderr will be placed in the files .o<job_number> and .e<job_number>. These 2 files will be in the directory that you submitted the job from.

To find information about your queued or running jobs you can use the command showq. This will show all of the running jobs over System X, cauldron, and inferno2. If you wish to only view cauldron jobs, use showq -p CAULDRON, or if you only want to see inferno2 jobs use showq -p INFERNO2. If you would like detailed information on your job, use qstat -f <job_number> or checkjob -v <job_number>.

If you have a job sitting in the queue that you think should be able to run, use the command checkjob -v <job_number> to see the reason the job is not running, as shown at the bottom of the output.

For more information about the queueing system see:

           http://www.arc.vt.edu/arc/sgi/queuing.php

There are instructional videos as well as presentation slides available at that page.

If you have questions, comments, or concerns, please let us know at arc@vt.edu.

Thank you.


VT-ARC Privacy Statement | Contact Us
VT-ARC is a Unit within the Office of the Vice President of Information Technology
© 2007-2008 Virginia Polytechnic Institute and State University
Principles of Community | Acceptable Use Policy | Accessibility | Equal Opportunity
Website Feedback   -   Page Last Updated:  July 24th, 2008