Using MPI on System X
Introduction
This document describes the "mechanics" of using an MPI program
on Virginia Tech's System X. It assumes that an MPI program has
already been written.
The implementation of MPI used on System X is known as MPICH.
System X is currently using version 1.2.5 of MPICH.
MPICH was developed at Argonne National Laboratories, which
maintains the MPICH web site at
http://www-unix.mcs.anl.gov/mpi/mpich1/. For a listing of available compilers, see:
System X Compilers.
Refer to
MPI Overview for a very brief overview of MPI.
This document will simply walk you through a typical series
of steps that take an MPI program on your "home" machine,
run it on System X, and bring the results back home.
At the end of this document are instructions on how to actually
carry out these steps, using sample files available on the web.
Some Assumptions
For this simple introduction, we'll make a number of assumptions.
(One file, simple name): We'll assume you have a program,
already written, which uses MPI,
that the program consists of a single file, written in C,
and that this file is called program.c.
(No input/Only standard output): We'll also assume for now
that the program needs no input, and that the output of the program
is entirely directed to the standard
output device. In other words, the executing program does not
read from or write to any auxilliary files.
(Source code on home machine):
We'll assume this source code file is sitting in the source_code
subdirectory on your home machine home_mac.
Our goal then is to transfer the file to System X, compile it,
run it, and retrieve the output.
Transferring Source Code to System X
System X comprises 1100 nodes, each containing two processors.
Most of these nodes are compute nodes, which are used
exclusively for computation. But a few nodes, known as
compile nodes, are set aside for interactive use, allowing users
to create file directories, store files, compile them, submit jobs,
and so on.
The System X compile nodes we are interested in have the following
IP addresses:
-
sysx1.arc.vt.edu
-
sysx2.arc.vt.edu
-
sysx3.arc.vt.edu
To compile your program, you will need to transfer the source code
of your MPI program to one of these nodes. This can be done with
the secure FTP program sftp. Here is a typical session,
which suggests how you might transfer the file. We are assuming
here that you already set up a subdirectory on System X called
work_directory.
home_mac: sftp sysx1.arc.vt.edu
sysx1: Password for user: xxxxx
sysx1: cd work_directory
sysx1: lcd source_code
sysx1: put program.c
sysx1: ls
sysx1: program.c
sysx1: quit
home_mac:
Note that the commands cd, pwd and ls are carried
out on the remote machine (sysx1 in this case) while the corresponding
commands lcd, lpwd and lls will be carried out
on the local machine (home_mac in this example). The put
command moves files from the local to the remote machine, while the
get command brings files from the remote machine to the local one.
If multiple files are to be transferred, the mget and mput
commands can be used.
Compiling the Source Code into an Executable
Once the source code file has been transferred to one of the compile
nodes, you can log in to the compile node and compile your file.
Since there is a single file server shared by all the compile nodes,
you can log in to any one of the compile nodes you like, and you
will see the same set of files.
To log in interactively, we use the Secure Shell program, ssh.
home_mac: ssh sysx2.arc.vt.edu
sysx1: Password for user: xxxxx
sysx1: cd work_directory
sysx1: mpicc program.c
sysx1: mv a.out program
Note that you must use the mpicc compiler to compile a C program
that invokes the MPI library. If the compile command fails because
the mpicc command cannot be found, you may need to invoke it with
the full path name:
/nfs/compilers/mpich-1.2.5/bin/mpicc program.c
If the compilation fails, you will need to revise your program.
You can either edit the program on your home machine and
transfer it again, or make the changes directly on the System X copy.
In our example, we assume the compilation was successful. We allowed
the compiler to assign the default name of a.out to the executable
program it created, and then we renamed it to program. We're now
ready to submit the program to execution, so we're staying logged in.
Submitting a Script to Run the Executable
Once the executable program has been created, you need a shell
script to run the program in parallel. This script specifies the number
of processors to be used, the time limit, and so on. An example of
such a shell script, with explanatory comments, is available in the
System X file system as "/nfs/docs/qsub-example.sh" or you can refer to
this qsub-example.sh.
Here is a simplified shell script for our example, which we will call
program.sh.
#!/bin/bash
#
#PBS -lwalltime=00:00:30
#PBS -lnodes=2:ppn=2
#PBS -W group_list=???
#PBS -q production_q
#PBS -A $$$
#
NUM_NODES=`/bin/cat $PBS_NODEFILE | /usr/bin/wc -l | /usr/bin/sed "s/ //g"`
cd $PBS_O_WORKDIR
export PATH=/nfs/software/bin:$PATH
jmdrun -printhostname -np $NUM_NODES -hostfile $PBS_NODEFILE \
./program &> program_output.txt
exit;
Replace the "???" field in this file by your group information.
To get your group, log into one of the System X compile nodes
and type
groups
Ignore the "staff" group in the output; use the other group
that is listed as the value of the "???" filed in the shell script.
Also replace the "$$$" field by your "hat", that is, the account
to which your computer work is to be billed. The hat value was
assigned when your project was approved and set up for System X.
We'll assume that the shell script program.sh is stored
in work_directory, the subdirectory which contains
program.c and the executable program. To run the job,
we must "submit" the shell script to the queuing system. To do this,
we must move to the subdirectory containing the job script and the
executable (which we're assuming is subdirectory work_directory),
and issue the command:
qsub program.sh
The qsub command asks the queing system to schedule your job
to run. The immediate response from the queueing system is a message
that assigns a job number. The job number can be used to
check on the progress of your job, and it will also be used as part
of the name of the log files created when your job is done.
For example, the response to your qsub command might be
40316.queue.tcf-int.vt.edu
in which case your job number is 40316.
Although our example job is small (only 30 seconds on 4 processors)
and should run quickly, it is always possible to check on the status
of all the jobs you have in the queue, by issuing the command
showq | grep YOUR_NAME
which might show you:
JOBID USERNAME STATE PROCS REMAINING STARTTIME
------- -------- ------ ------ --------- -----------
40316 your_name Idle 4 00:00:30 Mon Oct 15 14:06:00
You can also use the convenient command
qstat -u YOUR_NAME
whose output format is a little different:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- ----- ------- ------ --- --- ------ ----- - ----
40316.queue.tcf YOUR_NAM prodt program -- 2 1 -- 00:00 Q --
This command gives you information about the number of nodes requested, the amount of
time and memory requested and so on. The "S" (for "status") field lists a
value of "Q", which means the job has been queued, but has not started to run.
(Note that the output under each heading is truncated if it is long).
Retrieving the Output File
When the shell script is processed, and it is time to run the executable,
then we specified that the output of the executable was to go to
the file:
program_output.txt.
When you see this file created in your directory, you know the program has
begun to execute - however, you can't assume the program is done
yet. The program is done executing (and all the commands in
the shell script are completed) when you see the standard output
and standard error log files appear in the directory.
For our example,
these files would have the names
program.sh.o40316
program.sh.e40316
because they are the standard output "O" and standard error "E"
associated with the run of program.sh which had been assigned
the job number 40316. If you redirected the output of your
executable program to a file (we did) then typically these log files
won't contain anything of interest. However, if your job ran out of
time, or had a run time error, for instance, this information would
be stored in the standard error log file.
Assuming the job executed satisfactorily, you can examine the results
or pull them back to your home machine using the sftp program:
home_mac: sftp sysx3.arc.vt.edu
sysx3: Password for user: xxxxx
sysx3: cd work_directory
sysx3: lcd source_code
sysx3: get program_output.txt
sysx3: lls
sysx3: program.c program_output.txt
sysx3: quit
home_mac:
Using Sample Files for Experimentation
Sample files are available, so that you can try out the procedures
for file transfer, compilation, job submission, and output file
recovery.
-
Copy the appropriate source code file (choose your favorite
language) to your home machine.
- Copy the shell script program.sh
to your home machine.
- Edit program.sh by inserting the necessary group
and account fields at the beginning of the file.
- Transfer the source code file and the shell script to
your directory on System X.
- Compile the source code using the appropriate compiler,
and rename the executable to program.
-
Submit the shell script program.sh;
-
Retrieve the output file program_output.txt to your home
machine and compare it to the
sample results.
|