If resources are available on Cheetah, you can run parallel jobs
interactively using the Parallel
Operating Environment (POE). To run a program in parallel, you
specify the number of processors and/or nodes, the communication
library, and a particular "pool" of nodes. POE then uses LoadLeveler
to acquire a set of nodes in the specified pool. If nodes are not
available, the command fails.
LoadLeveler uses the class
"interactive" for all interactive jobs. Use the following
command for information on this class, including wall-clock run-time limits.
$ llclass -l interactive
If an executable is compiled for parallel
execution, it
will run under POE
as a single program with multiple processes. If the executable is
sequential, POE will start multiple copies of it
across the acquired nodes.
You can specify values for number of processors, communication
library, pool, etc. using environment variables or command-line
arguments to the "poe" command. Command-line arguments
override environment variables. The following table
summarizes the important options.
| "poe" option |
Environment variable |
Description |
| -procs n |
MP_PROCS=n |
The number ("n") of parallel processes. |
| -rmpool 1 |
MP_RMPOOL=1 |
The resource-manager pool that LoadLeveler will use to
allocate nodes. The compute nodes of the ORNL SP are in pool "1". |
| -euilib xx |
MP_EUILIB=xx |
Communication library. Valid values for "xx" are
"ip" for Internet Protocol and "us" for User
Space. The recommended value is "us", though
"ip" is the default. |
| none |
MP_SHARED_MEMORY=yes |
Use shared memory for MPI communication within a node. Requires
compilation with the thread-safe MPI library (i.e. using
"mpxlf_r", "mpcc_r", etc.). |
Note that it is not necessary (or recommended) that one specify the number of
nodes that should be used; LoadLeveler will map the processes of the job
appropriately onto nodes available for running interactive jobs.
The following example runs "a.out" on 4 processors using US over
the SP switch.
$ poe a.out -rmpool 1 -procs 4 -euilib us
For more information on "poe" options, see "man
poe". Online documentation for IBM's Parallel Environment (PE),
including POE, is available at the following URL. As of this writing,
Cheetah has verion 3 release 1 of PE.
http://www-1.ibm.com/servers/eserver/pseries/library/sp_books/pe.html
Etnus TotalView is a debugger for sequential, parallel, and threaded
programs, and it has
a powerful graphical interface. On Cheetah, it works with MPI, OpenMP,
and hybrid MPI-OpenMP applications. It also has a command-line interface.
Starting TotalView on a parallel job is nontrivial because of
current limitations of "rsh" under DCE on Cheetah. To simplify
the procedure, use the "tv" script. It takes the same options
as "poe", but runs "poe" within TotalView. Simply
replace "poe" with "tv" on the command line.
The following example starts TotalView on 16
processors across 2 compute nodes using US over the SP switch.
$ tv a.out -rmpool 1 -procs 16 -nodes 2 -euilib us
(To debug sequential jobs or core files, use TotalView directly
instead of "tv".)
TotalView starts daemons on remote nodes where your parallel job runs,
and those daemons need to have DCE credentials. The "tv"
script forwards your credentials to the remote nodes.
In order to forward your credentials, however,
these credentials must be forward-able. They are not by
default. Therefore, before running "tv", you must issue
"kinit -f" and give your password.
$ kinit -f
Enter Password:
Your credentials will be forwardable for the remainder of the
session.
Since TotalView is an X-Window
application, you must have the "DISPLAY" environment variable
set to point to your local display. You may need to issue the
following command on your machine to
allow the Cheetah login node to display there:
local$ xhost + cheetah.ccs.ornl.gov
After you run the "tv" command, two windows should
appear. In the
larger window, you will see the assembly code for "poe". Type
"G" (capital G) in this window to cause all processes to
"Go". TotalView will run for a few
seconds and then ask if you'd like to stop your processes before
entering "MAIN". Answer, "yes", to stop your program at the beginning,
so you can add breakpoints, etc. before running.
For more information on "tv", see "man tv".
For more information on using TotalView, see "man totalview" or
type "?" within a TotalView window. The TotalView
User's Guide is available on Cheetah in the
following location.
/usr/local/com/toolworks/totalview/doc/pdf/user_guide.pdf
For information on the TotalView command-line interface, see
the TotalView Command Line Interface Guide, available on Cheetah
in the following location.
/usr/local/com/toolworks/totalview/doc/pdf/cli_guide.pdf
Documentation for the TotalView graphical interface and command-line interface
is also available directly from Etnus at the
following URL.
http://www.etnus.com/Support/docs/index.html