Through the "Free Slots" entries, the "llclass" command
can give some information about the status of the system and what your
chances are for running jobs immediately. As mentioned above, however,
this information is misleading. For more accurate information about the
load on the system, use the "llstatus" command.
$ llstatus
Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys
cheetah01.ccs.ornl.gov Avail 0 0 Busy 32 34.40 9999 RS6000 AIX51
cheetah02.ccs.ornl.gov Avail 1 1 Run 8 20.77 9999 RS6000 AIX51
cheetah03.ccs.ornl.gov Avail 204 21 Idle 0 2.75 1 RS6000 AIX51
cheetah04.ccs.ornl.gov Avail 0 0 Busy 32 34.02 9999 RS6000 AIX51
...
cheetah27.ccs.ornl.gov Avail 0 0 Busy 32 32.10 9999 RS6000 AIX51
cheetah41.ccs.ornl.gov Avail 0 0 Run 0 0.00 9999 RS6000 AIX51
cheetah42.ccs.ornl.gov Avail 0 0 Run 0 0.00 9999 RS6000 AIX51
...
cheetah48.ccs.ornl.gov Avail 0 0 Run 0 0.00 9999 RS6000 AIX51
RS6000/AIX51 35 machines 208 jobs 566 running
Total Machines 35 machines 208 jobs 566 running
The Central Manager is defined on cheetah15.ccs.ornl.gov
The BACKFILL scheduler is in use
All machines are on the machine_list are present.
The "Schedd" column indicates whether the node is able to schedule
LoadLeveler jobs; "Avail" means it can. "InQ" gives the
number of current jobs submitted from (not running on) the given node,
and "Act" gives the number of those jobs that are actually running
(on other nodes). "Startd" indicates whether any jobs are running
on the given node, and "Run" indicates the number of job instances
that are running.
For most Cheetah nodes, the "Run" number will be equal to or less
than the number of processors in that node. The same job can have more
than one instance running on a given node; for example, a 32-processor
node may have 32 MPI processes from the same job.
"LdAvg" is the Berkeley one-minute load average, and "Idle"
is the time in seconds since the last keyboard or mouse activity on the
node. For Cheetah nodes, "Idle" is often "9999".
The lines at the bottom of the output indicate that 46 nodes
(including the control workstation) are currently
under the control of LoadLeveler. On these nodes, 208 jobs are running, and
those jobs consume 566 slots. Because one slot can represent a single-thread
or multiple-thread process, slots are neither equivalent to processors
nor nodes.
A more effective way to determine what resources Cheetah has, along
with which of those resources are available, is to use the
"-R" option. This causes "llstatus" to display
"consumable resources", usable processors and memory.
$ llstatus -R
Machine Consumable Resource(Available, Total)
------------------------------ -------------------------------------------------
cheetah04c.ccs.ornl.gov ConsumableCpus(32,32) ConsumableMemory(32.000 gb,32.000 gb) ConsumableScratch(160,160)
cheetah06c.ccs.ornl.gov ConsumableCpus(0,32) ConsumableMemory(0.000 mb,32.000 gb) ConsumableScratch(160,160)
cheetah07c.ccs.ornl.gov ConsumableCpus(0,32) ConsumableMemory(0.000 mb,32.000 gb) ConsumableScratch(160,160)
cheetah12c.ccs.ornl.gov ConsumableCpus(32,32) ConsumableMemory(32.000 gb,32.000 gb) ConsumableScratch(160,160)
cheetah13c.ccs.ornl.gov ConsumableCpus(32,32) ConsumableMemory(32.000 gb,32.000 gb) ConsumableScratch(160,160)
cheetah14c.ccs.ornl.gov ConsumableCpus(32,32) ConsumableMemory(32.000 gb,32.000 gb) ConsumableScratch(160,160)
cheetah15c.ccs.ornl.gov ConsumableCpus(16,32) ConsumableMemory(112.000 gb,128.000 gb)
cheetah16c.ccs.ornl.gov ConsumableCpus(32,32) ConsumableMemory(128.000 gb,128.000 gb)
cheetah20c.ccs.ornl.gov ConsumableCpus(24,32) ConsumableMemory(24.000 gb,32.000 gb)
cheetah21c.ccs.ornl.gov ConsumableCpus(24,32) ConsumableMemory(24.000 gb,32.000 gb)
cheetah26c.ccs.ornl.gov ConsumableCpus(24,32) ConsumableMemory(24.000 gb,32.000 gb)
cheetah27c.ccs.ornl.gov ConsumableCpus(24,32) ConsumableMemory(24.000 gb,32.000 gb)
cheetah41c.ccs.ornl.gov ConsumableCpus(3,3) ConsumableMemory(3.000 gb,3.000 gb)
cheetah42c.ccs.ornl.gov ConsumableCpus(3,3) ConsumableMemory(3.000 gb,3.000 gb)
cheetah43c.ccs.ornl.gov ConsumableCpus(3,3) ConsumableMemory(3.000 gb,3.000 gb)
cheetah44c.ccs.ornl.gov ConsumableCpus(3,3) ConsumableMemory(3.000 gb,3.000 gb)
cheetah45c.ccs.ornl.gov ConsumableCpus(3,3) ConsumableMemory(3.000 gb,3.000 gb)
cheetah46c.ccs.ornl.gov
cheetah47c.ccs.ornl.gov ConsumableCpus(3,3) ConsumableMemory(3.000 gb,3.000 gb)
cheetah48c.ccs.ornl.gov
LoadL_startd daemons of machines with "#" appended to their names are down.
This command displays the number of processors and the amount of
memory on each node. For example, node 06 has 32 processors available for
LoadL jobs and 32.0GB of memory. 32 processors are in use, 32GB of memory is taken,
and all the local scratch space is still available. Some of the nodes
have 160 GB of local scratch space ($NODE_JOBDIR), some do not.
All of the p690 nodes are configured as 32-way SMP nodes. We have
some p655s configured into loadleveler for interactive use, these are
numbered 41-48. These are also I/O server nodes.
Notice that nodes with "#"
appended to their names are not available to LoadLeveler.
Some of the columns of default "llstatus" output are not
particularly
useful, and "llstatus" is capable of displaying useful
information that is not shown by default. To remedy this, you can
configure the output
generated by "llstatus" on the command line. Here is an example
configuration.
$ llstatus -f %n %mt %r %l %v %scs %sts
Name MaxT Run LdAvg FreeVMemory Schedd Startd
cheetah01.ccs.ornl.gov 32 32 32.05 66048868 Avail Busy
cheetah02.ccs.ornl.gov 32 8 16.92 66052976 Avail Run
cheetah03.ccs.ornl.gov 32 0 2.78 65774208 Avail Idle
cheetah04.ccs.ornl.gov 32 32 33.13 66034208 Avail Busy
...
cheetah27.ccs.ornl.gov 32 8 19.23 66053208 Avail Run
cheetah41.ccs.ornl.gov 3 0 8.29 8904108 Avail Idle
cheetah42.ccs.ornl.gov 3 0 8.54 8902292 Avail Idle
...
cheetah45.ccs.ornl.gov 3 0 41.56 4706172 Avail Idle
cheetah47.ccs.ornl.gov 3 0 41.56 4706172 Avail Idle
...
This example prunes out some of the default information and adds
"MaxT" and
"FreeVMemory". "MaxT" gives the maximum number of job
instances (regardless of class) that may run on the given host at a time, and
"FreeVMemory"
gives the available swap space, in kilobytes. See "man
llstatus" for more information on configuring output. You
may want to create an "alias" for the "llstatus"
configuration you prefer.
To run a batch job under LoadLeveler, you first need to write a job command
file. LoadLeveler command files have two components: LoadLeveler
keyword statements and shell commands.
The LoadLeveler keyword statements are preceded by "#@", making
them appear as comments to a shell. The shell commands follow the "#@
queue" keyword statement and represent the executable content of the
batch job.
A nice feature LoadLeveler provides is the ability to define job
prolog and epilog scripts. If you have steps that should take place at
the beginning or end of all your jobs, you can define a prolog and/or
epilog script and code these steps once for all your jobs.
Before starting your job, LoadLeveler looks for an environment
variable named $MY_CCS_LOADL_PROLOG. If it contains the path
of an executable file, that file is run before starting your job
script. If $MY_CCS_LOADL_PROLOG is not defined and
$HOME/llprolog exists and is executable, it will be run.
Similarly, after your job completes, if $MY_CCS_LOADL_EPILOG
is defined and contains the path of an executable file, that file will
be run. If $MY_CCS_LOADL_EPILOG is not defined, but
$HOME/llepilog exists and is executable, it will be run.
Generally, defining and exporting these environment variables in your
.profile (or setting them with setenv in your .login if you use a csh
variant), is sufficient to define them to LoadLeveler.
Below you will find examples of various command files, specifying different
parallel paradigms and resource requirements.
What not to do
When porting job command files from other systems, such as Eagle or
Seaborg, there are a few LoadLeveler statements you should not use on
Cheetah.
#@ node_usage = not_shared
This statement requests that each node associated with your job is
not shared with any other jobs.
Cheetah has large, powerful nodes. If your job does not use all the
resources on a node, you need to make the remaining resources
available to other users. If your job does need all the resources on a
node, there are more appropriate ways to request those resources than
with "not_shared". See the examples below for details.
#@ network.MPI = csss,not_shared,US
The "not_shared" in this statement requests that the SP
Switch2 interconnect on each node be reserved for exclusive use by the
requesting job. If your job does not use all the resources on a
node, you need to make the interconnect available to other jobs. If
your job does use all the resources on a node, requesting
"not_shared" is unnecessary since no other jobs will be
allowed on the node anyway. Please use "shared".
No ending newline ("csh" only)
Make sure that your command file ends with a newline. If it does not, LoadLeveler
will not execute the last command in your file. You can use the following
command to check your file.
tail command_file
You need to add a newline (using "return" or "enter" in an editor) if the
next command prompt appears on the same line as the last line of
the file. Here is an example of such a case.
cheetah48% tail csh.ll
#@ error = $(host).$(jobid).err
#@ wall_clock_limit = 30:00
#@ tasks_per_node = 32
#@ node = 1
#@ queue
pwd
echo $LOADL_PROCESSOR_LIST
setenv MP_SHARED_MEMORY yes
poe a.outcheetah48%
If a newline is not added to this file, the command "poe a.out"
will not be executed when the job runs!
Multiple "resources" lines
As described below, you can use the "resources" keyword to
define how many processors per task and how much memory per task your
job needs.
Always make all your resource requests on one line, for multiple
requests are not additive. Each "resource" line
logically overwrites all previous lines. For example, the following
lines result in the default number of processors per task, 1,
not 8, and no scratch reserved!
#@ resources = ConsumableCpus(8)
#@ resources = ConsumableScratch(4)
#@ resources = ConsumableMemory(1 gb)
Use a single line instead.
#@ resources = ConsumableCpus(8) ConsumableMemory(1 gb) ConsumableScratch(4)
NOTE: Requested consumable resources are per process -- not per job or task
Here is an example command file for a parallel MPI job.
#@ shell = /bin/ksh
#@ job_type = parallel
#@ network.MPI = csss,shared,US
#@ output = $(host).$(jobid).out
#@ error = $(host).$(jobid).err
#@ wall_clock_limit = 30:00
#@ tasks_per_node = 32
#@ node = 2
#@ queue
pwd
echo $LOADL_PROCESSOR_LIST
export MP_SHARED_MEMORY=yes
poe a.out
Here is a description of each line. The script has no line specifying
"class", so the default class, "batch", will be used.
#@ shell = /bin/ksh
Use the Korn shell, "ksh", to interpret the command file.
By default, LoadLeveler interprets the command file using your login shell.
The sample script is written in "ksh" syntax, so the explicit
request of "ksh" allows it to work regardless of your login shell.
If you prefer to use C-shell syntax, make the following changes to the
sample command file.
| Korn shell |
C shell |
| #@ shell = /bin/ksh |
#@ shell = /bin/csh |
| export MP_SHARED_MEMORY=yes |
setenv MP_SHARED_MEMORY yes |
#@ job_type = parallel
Use multiple nodes for parallel commands. This keyword is required
for parallel jobs. The keywords "tasks_per_node", "node",
etc. won't work without it.
#@ network.MPI = csss,shared,US
For MPI communication, use the SP Switch2 with the User Space
protocol. This line requests that parallel MPI
programs use the
fastest form of internode communication available on the SP, User
Space (US) protocol over the SP Switch2 using both switch interfaces on each node (device
"csss").
A separate "network" keyword is allowed for IBM's Low-Level
Application Programming Interface, "network.LAPI".
#@ output = $(host).$(jobid).out
Send standard output to the file "$(host).$(jobid).out".
"$(host)" is a LoadLeveler variable that represents the host where
the job was submitted. It is not necessarily related to where the job runs.
"$(jobid)" is a number ID of the running job. Each "$(jobid)"
is unique for a given job submitted from a particular host. Each "$(jobid)"
is not necessarily unique across LoadLeveler; two jobs submitted from two
different hosts can have the same value for "$(jobid)". The combination
of "$(host).$(jobid)" is unique, however. Example: "131.out"
and "131.out" versus "cheetah0001.131.out" and "cheetah0017.131.out".
Another useful variable is "$(executable)", which represents the
name of the LoadLeveler command file.
Unless you specify a full path, the output file is stored in the directory
from which you submitted the job. If you don't specify the "output"
keyword, the standard output is not saved.
#@ error = $(host).$(jobid).err
Send standard error output to the file "$(host).$(jobid).err".
See the information above for the "output" keyword. You can send
standard output and standard error to the same file.
#@ wall_clock_limit = 30:00
Limit the job to 30 minutes of real time. If you do not specify
a "wall_clock_limit", your job will get the default limit of two
hours, regardless of class. For jobs longer than two hours, you
must specify a longer limit. For shorter jobs, specifying a shorter time
limit may allow the scheduler to fit your job in earlier.
#@ tasks_per_node = 32
Use 32 tasks per node for parallel jobs. A task is equivalent
to a process, and a single task may have multiple threads. This line specifies
that 32 tasks, 32 MPI processes in this case, should be started on each
node. Note that there are only 32-way nodes on Cheetah.
#@ node = 2
Allocate 2 nodes for parallel commands. Yes, the keyword is "node",
not "nodes".
#@ queue
Queue the job! This keyword is critical. Without it, no job is
created. Each "queue" keyword uses the environment specified by
the keywords listed before it, so make sure to put it after the other relevant
keywords.
The remaining lines of the file specify the shell commands to be executed
by the batch job. All sequential commands, such as the first three commands
in this example, run on only the first node allocated to the job. Parallel
commands start multiple processes spread across all allocated
nodes.
pwd
Display the name of the current working directory. The job starts
in the directory where the job was submitted. This behavior is different
from some other batch systems, which always start jobs in the user's home
directory.
echo $LOADL_PROCESSOR_LIST
Display the nodes allocated to this job. LoadLeveler automatically
sets the value of the environment variable "LOADL_PROCESSOR_LIST"
to a list of the nodes allocated for the given job. Printing this list
in each job can help diagnose system problems. If you have more than 128
tasks, however, do not print this variable. LoadLeveler has trouble
printing this for more than 128 tasks; it may cause your job to fail.
export MP_SHARED_MEMORY=yes
Use shared memory for MPI. IBM's MPI can implement communication
within a node using shared memory. This implementation greatly improves
the bandwidth and latency of on-node communication without affecting
communication between nodes.
This is used by default so you don't need to set it in your batch
script, but be aware that it uses extra memory.
If you wish to turn it off,
in "ksh" use "export MP_SHARED_MEMORY=no".
For "csh", use "setenv MP_SHARED_MEMORY no" instead.
To take advantage of this shared-memory optimization, an MPI code must
be compiled with the thread-safe version of the MPI library,
i.e.
using "mpxlf_r" or "mpcc_r".
poe a.out
Run 64 copies of "a.out" across 2 nodes. If "a.out"
is not a parallel program, this command will run 64 identical copies on
2 different nodes. If "a.out" is parallel (compiled with "mpxlf",
"mpcc", etc.), it will run as a single 64-process application
across 2 nodes. Specifying "poe" is optional for programs compiled
to be parallel.
Note that POE options specified through LoadLeveler keyword commands
("node", "tasks_per_node", "network", etc.)
override options on the "poe" command line.
Here is an example command file for a threaded OpenMP job.
#@ shell = /bin/ksh
#@ job_type = serial
#@ output = $(host).$(jobid).out
#@ error = $(host).$(jobid).err
#@ wall_clock_limit = 30:00
#@ resources = ConsumableCpus(8)
#@ queue
pwd
echo $LOADL_PROCESSOR_LIST
export OMP_NUM_THREADS=8
a.out
Here is a description of each line that differs with the MPI example. See above for details on the other statements.
#@ job_type = serial
Use a single node for the job. Each statement in the script
should use only one process, though each process may have multiple threads.
#@ resources = ConsumableCpus(8)
Reserve 8 processors. This statement is critical for
OpenMP jobs! Though the job is "serial" in terms of
processes, it uses multiple threads per process. This statement
reserves 8 processes, but it does not set the number of OpenMP
threads!
export OMP_NUM_THREADS=8
Use 8 OpenMP threads per process. This number is typically
the same as the number of "ConsumableCpus" set above. For
"csh", use the following instead.
setenv OMP_NUM_THREADS 8
a.out
Run a single copy of "a.out" using 8 threads and 8
processors. Note that each "8" is set separately.
Here is an example command file for a hybrid MPI-OpenMP job. Each MPI
process uses multiple OpenMP threads.
#@ shell = /bin/ksh
#@ job_type = parallel
#@ network.MPI = csss,shared,US
#@ output = $(host).$(jobid).out
#@ error = $(host).$(jobid).err
#@ wall_clock_limit = 30:00
#@ tasks_per_node = 4
#@ node = 2
#@ resources = ConsumableCpus(8)
#@ queue
pwd
echo $LOADL_PROCESSOR_LIST
export MP_SHARED_MEMORY=yes
export OMP_NUM_THREADS=8
poe a.out
Here is a description of each line that differs with the MPI example. See above for details on the other statements.
#@ resources = ConsumableCpus(8)
Reserve 8 processors for each MPI task. This statement is critical for
OpenMP jobs! This statement
reserves 8 processes per MPI task, but it does not set the number of OpenMP
threads per task!
export OMP_NUM_THREADS=8
Use 8 OpenMP threads per MPI task. This number is typically
the same as the number of "ConsumableCpus" set above. For
"csh", use the following instead.
setenv OMP_NUM_THREADS 8
poe a.out
Run 8 copies of "a.out" across two nodes, where each
copy uses 8 threads on 8 processors. This job uses a total of 64
processors. Because "ConsumableCpus" and
"OMP_NUM_THREADS" are set the same, each thread will have a full
processor to use.
If you do not specify a memory requirement, each process gets the
default, which may be as little as 256MB per process.
Most Cheetah nodes have roughly 1GB per processor, but a few have
more. You can see what memory resources are available using
"llstatus -R", as described above.
Use the "ConsumableMemory" resource to specify memory
requirements, as in the following example, which
requests 2GB per task.
#@ resources = ConsumableMemory(2 gb)
You can specify memory in other units, including MB ("mb")
and kB ("kb"). This resource is a "per-task" resource -
it is not the total amount you want to use.
Make sure to include all resource requests on a single
"resources" line. The following example requests 32
processors and 64GB per task, such as for a large OpenMP job.
#@ resources = ConsumableCpus(32) ConsumableMemory(64 gb)
If you do not specify a scratch space requirement, then each process
gets the default which is none.
Some of the 32-way nodes have 160 GB of local scratch space.
There are no LPARs.
Use a llstatus -R to check the current configuration.
Use the "ConsumableScratch" resource to specify memory
requirements, as in the following example, which requests 10 GB of
disk space assuming you asked for 1 task. As with all Consumable
resources, this is a per task request..
#@ resources = ConsumableScratch(10)
You cannot specify the units, it is always in Gigabytes.
Make sure to include all resource requests on a single
"resources" line. The following example requests 32
processors and 64GB per task, such as for a large OpenMP job,
and 10 GB of local scratch space.
#@ resources = ConsumableCpus(32) ConsumableMemory(64 gb) ConsumableScratch(10)
Use "llq" to check the status of submitted jobs.
$ llq
Id Owner Submitted ST PRI Class Running On
---------------------- ---------- ----------- -- --- ------------ -----------
cheetah48.12813.0 ernie 11/27 04:20 R 50 batch cheetah04
cheetah48.12816.0 ernie 11/27 04:50 R 50 batch cheetah06
cheetah48.12820.0 ernie 11/27 08:10 R 50 batch cheetah27
cheetah48.12814.0 grover 11/27 04:29 R 50 batch cheetah26
cheetah48.12815.0 grover 11/27 04:30 I 50 batch
cheetah01.218.0 zoe 11/27 08:41 I 1 batch
cheetah48.12846.0 bert 11/27 09:40 I 50 batch
cheetah48.12848.0 elmo 11/27 09:42 I 50 batch
cheetah48.12850.0 bert 11/27 09:46 I 50 batch
cheetah48.12851.0 bert 11/27 09:50 I 50 batch
cheetah48.12852.0 bert 11/27 09:52 I 50 batch
cheetah48.12853.0 bert 11/27 09:54 I 50 batch
cheetah48.12854.0 bert 11/27 09:56 I 50 batch
cheetah48.12856.0 bert 11/27 09:58 I 50 batch
cheetah48.12860.0 bert 11/27 10:03 I 50 batch
cheetah48.12861.0 herry 11/27 10:08 I 50 batch
cheetah48.12862.0 oscar 11/27 10:08 I 50 batch
cheetah48.12863.0 cookie 11/27 10:22 I 50 batch
cheetah48.12865.0 kermit 11/27 11:07 I 50 batch
19 job steps in queue, 15 waiting, 0 pending, 4 running, 0 held
The first column is the name of each job step, the second column is the
owner of the job, and the third column is the time when the job was first
submitted to LoadLeveler. The "ST" column gives the status of
each job. Here are some common status values.
| R |
Running |
| ST |
STarting |
| I |
Idle, waiting for resources |
| H |
Held by the user |
| S |
held by the System |
| RP |
Remove Pending, being removed |
The "PRI" column gives the user priority of the job, though this
priority is not currently used in making scheduling decisions. The "Class"
column gives the class specified in the job command file ("batch"
is the default). The final column, "Running On", gives the first
node assigned to each running job. Only this first node appears, even for
parallel jobs running on multiple nodes.
Some of the columns of default "llq" output are not particularly
useful, and "llq" is capable of displaying useful information
that is not shown by default. To remedy this, you can configure the output
generated by "llq" on the command line. Here is an example configuration.
$ llq -f %o %id %nh %st %dd %dq
Owner Step Id NM ST Disp. Date Queue Date Running On
----------- ---------------------- ---- -- ----------- ----------- --------------
kermit cheetah48.12865.0 0 I 11/27 11:07
cookie cheetah48.12863.0 0 I 11/27 10:22
oscar cheetah48.12862.0 0 I 11/27 10:08
herry cheetah48.12861.0 0 I 11/27 10:08
bert cheetah48.12860.0 0 I 11/27 10:03
bert cheetah48.12856.0 0 I 11/27 09:58
bert cheetah48.12854.0 0 I 11/27 09:56
bert cheetah48.12853.0 0 I 11/27 09:54
bert cheetah48.12852.0 0 I 11/27 09:52
bert cheetah48.12851.0 0 I 11/27 09:50
bert cheetah48.12850.0 0 I 11/27 09:46
elmo cheetah48.12848.0 0 I 11/27 09:42
bert cheetah48.12846.0 0 I 11/27 09:40
ernie cheetah48.12866.0 0 I 11/27 11:20
grover cheetah48.12815.0 0 I 11/27 04:30 11/27 04:30
ernie cheetah48.12816.0 8 R 11/27 04:50 11/27 04:50 cheetah04
zoe cheetah01.218.0 0 I 11/27 08:41 11/27 08:41
ernie cheetah48.12813.0 8 R 11/27 04:20 11/27 04:20 cheetah06
ernie cheetah48.12820.0 16 R 11/27 08:10 11/27 08:10 cheetah27
grover cheetah48.12814.0 32 R 11/27 04:29 11/27 04:29 cheetah26
In addition to the owner, job name, and status, this format gives "NM",
the number of nodes used by the job, "Disp. Date", the time the
job was started, and "Queue Date", the time the job was queued.
See "man llq" for more information on configuring output. You
may want to create an alias for the "llq" configuration
you prefer.
As an alternative to "llq", we provide the local utility "llqn",
which lists a different set of job characteristics. To list all the characteristics
available from "llqn", use the "-a" option.
$ llqn -a
Job Id Owner Class SysPrio S Date
Node
----------------------------- -------- ------------ -------- - ---------------- ----
cheetah48.ccs.ornl.gov.12815.0 grover batch -4837086 R Nov 28 04:30 32
cheetah48.ccs.ornl.gov.12813.0 ernie batch -4740231 R Nov 28 04:20 8
cheetah48.ccs.ornl.gov.12820.0 ernie batch -4718494 R Nov 28 08:10 16
cheetah48.ccs.ornl.gov.12814.0 grover batch -4718498 R Nov 27 12:29 32
cheetah48.ccs.ornl.gov.12816.0 ernie batch -4842014 I Nov 28 04:50 8
cheetah01.ccs.ornl.gov.218.0 zoe batch -4842039 I Nov 28 08:41 32
cheetah48.ccs.ornl.gov.12865.0 kermit batch -4747045 I Nov 27 11:07 144
cheetah48.ccs.ornl.gov.12863.0 cookie batch -4749527 I Nov 27 10:22 80
cheetah48.ccs.ornl.gov.12862.0 oscar batch -4836911 I Nov 27 10:08 16
...
Unlike "PRI" with "llq", "SysPrio" is an accurate
representation of the scheduling priority; the job with the largest (least
negative) priority is scheduled next. Jobs with lower priority can skip
ahead if they can fit in holes in the scheduled job mix. This is called
backfilling.
"Date" means different things for running and waiting ("I")
jobs. For waiting jobs, "Date" is the queue time. For running
jobs, "Date" is the latest time the job will finish, based on
the start time and the wall-clock limit.
See "man llqn" for more details.
Why
isn't my job running?
You can verify that your job is not running by checking the "ST"
column of "llq" output. You can then use "llq -s" with
the job name to find out why it isn't running. The output created
by "llq -s" is long, so you may want to pick out the useful lines
using "sed". The following example demonstrates how to display
lines of "llq -s" output between the line "SUMMARY" and
the line "ANALYSIS".
$ llq -s cheetah48.12865.0
...
(pages of information)
...
$ llq -s cheetah48.12865.0 | sed -n '/SUMMARY/,/ANALYSIS/p'
SUMMARY
This LoadLeveler cluster does not have sufficient resources at the present time
to run this job step.
ANALYSIS
The LoadLeveler cluster may not have sufficient resources for a variety
of reasons. Nodes may be busy with other jobs, for example. Unfortunately,
LoadLeveler cannot distinguish between a temporary reduction of resources
and permanent system limitations. Therefore, if a job requests more nodes
than the system has, the job will wait, and "llq -s" will return
the message above, despite the fact that the job will never be able
to run.
In addition to "I", waiting jobs may appear with the
"H" (hold) status. This status often means something has
gone wrong. Here are some common reasons that jobs are held.
-
NFS is down on one or more nodes. This can be verified using the following
command.
dsh 'ls -d ~/public'
-
You have exceeded your NFS quota. This can be checked by the
following command.
lsquota
-
One or more of the LoadLeveler options is set to a resource that does not
exist.
If you have a job in the "H" state and cannot determine how
it got there, feel free to contact "consult@ccs.ornl.gov".
What
nodes is my job using?
You can use "llq -l" to display detailed information about LoadLeveler
jobs, including a list of the nodes allocated for each job. You can use
"grep" to isolate this node list, as in the following example.
$ llq -l cheetah48.12813.0 | grep "gov::"
Allocated Hosts : cheetah04.ccs.ornl.gov::en3(-1,MPI,IP,0M),en3(-1,MPI,IP,0M),,en3(-1,MPI,IP,0M),
+ en3(-1,MPI,IP,0M),en3(-1,MPI,IP,0M),en3(-1,MPI,IP,0M),en3(-1,MPI,IP,0M),en3(-1,MPI,IP,0M)
Notice that this example has 8 "en3" entries, indicating
that 8 MPI processes are running on node 04.