XL Fortran
Contents
The Fortran compiler on Cheetah is IBM's XL Fortran. It supports the Fortran
95 standard, POSIX threads, and OpenMP.
Although there is only one actual Fortran compiler, you can invoke the
compiler using a variety of different commands. Each command has different
default options. Here are descriptions of some of these commands.
| xlf or f77 |
Assumes Fortran-77-style "fixed format" source code. Does understand
Fortran-95 syntax. Links with Fortran libraries that are not thread
safe. By default, all local variables are static, as if they were declared
with "SAVE". Use "-qnosave" to change this behavior. |
| xlf95 |
Assumes "free format" source code. Links with Fortran libraries that
are not thread safe. By default, all local variables are automatic,
like in C and C++. |
| mpxlf |
Like "xlf", but automatically links with MPI and LAPI libraries. |
| mpxlf95 |
Like "xlf95", but automatically links with MPI and LAPI libraries. |
| ..._r |
Add "_r" to the name of each compiler to compile threaded
codes and automatically link with the thread-safe Fortran libraries and
POSIX thread libraries (e.g. "xlf95_r", "mpxlf_r").
OpenMP, Pthread, and 64-bit MPI programs must use the re-entrant "_r" compiler commands. |
Various options control the way memory is used by Fortran programs. These
options can be critical for large-memory applications and applications
ported from other systems. Here is a description of some options for memory
management.
| -q64 |
Creates an executable with a 64-bit address space. All object files
making up the executable must be compiled with "-q64". The
Parallel Environment is available in 32- and 64-bit form. Note that
64-bit MPI executables must be compiled with the thread-safe compiler,
"mpxlf_r".
|
| -bmaxdata |
By default, 32-bit executables only access one segment, or 256
MB, of memory. By linking with "-bmaxdata", you can increase this
range up to eight segments, or 2 GB. Specifying "0x80000000" allows
the full 2-GB range.
-bmaxdata:0x80000000
This option does
not specify the size of memory the executable actually
uses, but the maximum amount it could possibly use.
The default for 64-bit executables is over 100,000 TB of heap/data and
over 10,000 TB of stack. These can be increased to the hard limits with
use of "-bmaxdata" and "-bmaxstack", but the address
model does not change.
|
| -qnosave |
By default, the "xlf" command creates all variables as if
they were declared "SAVE". This is useful for FORTRAN-77 codes.
This default is inappropriate, however, for fixed-format codes that use
some modern features, like recursion or thread parallelism. Use the "-qnosave"
option to override the default. The "xlf95" commands have an implicit
"-qnosave". |
There are various issues that affect portability between Cheetah and
other machines . There are issues even for porting between Eagle and
Cheetah. The following is a brief description of some
of these issues.
| -q64 |
Creates an executable with a 64-bit address space. All object
files making up the executable must be compiled with "-q64". The
Parallel Environment is available in 32- and 64-bit form, but 64-bit
MPI executables must be compiled with the threadsafe (reentrant,
"_r") compilers. Note that "-q64" does
not change the default size of "INTEGER" or
"REAL" variables.
64-bit object and executable files created on Eagle will not be compatible
with Cheetah until Eagle is running AIX 5.1. (It is now at AIX 4.3.)
(32-bit files are compatible.) |
| namelist=old |
The Fortran compiler defaults to Fortran-90 namelist format.
If you use namelists with the Fortran-77 format, you will need to issue
"export XLFRTEOPTS='namelist=old'" if using "ksh"
and "setenv XLFRTEOPTS 'namelist=old'"
if using "csh". (This is not a compiler option.) |
| -qrealsize=8 |
The default size of variables declared "REAL" is 4 bytes.
On some other architectures, like Cray Research systems, the default size
of "REAL" is 8 bytes. You can use the "-qrealsize" option
to change the XL-Fortran default from 4 to 8, which can simplify porting
from Cray-like systems. It is important to note that this option also changes
the size of "DOUBLE PRECISION" from 8 bytes to 16 bytes. |
| -qautodbl |
Some codes rely on the ability to upgrade the size of "REAL"
and "REAL*4" without changing the size of "DOUBLE PRECISION,"
making them all 8 bytes. Using "-qautodbl=dbl4" does this,
unlike "-qrealsize=8." "DOUBLE PRECISION" and "REAL*8"can
be upgraded to 16 bytes without affecting the size of "REAL" by using
"-qautodbl=dbl8." Furthermore, "REAL" can be promoted
to 8 bytes and "DOUBLE PRECISION" to 16 bytes with "-qautodbl=dbl." |
| -qnosave |
By default, the "xlf" command creates all variables as if
they were declared "SAVE". This is useful for FORTRAN-77 codes.
This default is inappropriate, however, for fixed-format codes that use
some modern features, like recursion or thread parallelism. Use the "-qnosave"
option to override the default. The "xlf95" commands have an implicit
"-qnosave". |
The following options provide a high level of optimization that is also
safe.
xlf_r -g -O3 -qmaxmem=-1 -qstrict ...
The "-g" tells the compiler to include information in the executable
to allow effective debugging. It doesn't inhibit optimization at all, so
we advise that you always include it.
The "-qmaxmem=-1" allows the compiler to use more memory for
space-intensive optimizations. (It has nothing to do with the amount of
memory used by the executable.)
By removing the "-qstrict", you can allow for higher optimization,
but the order of arithmetic operations may be changed. This can lead to
mathematically equivalent but numerically different results.
For potentially higher performance, you may want to experiment with
higher levels of optimization. The following options provide "high-order
transformations", which help optimize loops. These transformations are
particularly important for optimizing Fortran-95 array statements.
xlf_r -g -O4 -qnoipa -qmaxmem=-1 -qstrict ...
Again, you can leave off the "-qstrict" if you want to allow the
order of arithmetic operations to change. The "-O4" option includes
inter-procedural analysis (IPA), and we recommend turning it off using
"-qnoipa". For typical computational codes, we have found that
IPA increases compile time dramatically without significantly increasing
performance.
If you want to experiment with IPA, you could try the following.
xlf_r -g -O5 -qmaxmem=-1 ...
The only difference between "-O4" and "-O5" is the level
of IPA; "-O5" uses the highest (and most time consuming) level.
The XL Fortran compiler supports both explicit and automatic shared-memory
parallelization. For explicit parallelization, the compiler supports OpenMP.
Use the "-qsmp" option to turn on shared-memory parallelization.
The compiler will automatically parallelize "DO" loops and array
statements when it can prove that such parallelization is safe. You must
use the thread-safe compiler commands ("..._r") to use "-qsmp".
The "-qreport" option causes the compiler to produce a loop-transformation
report that includes information about automatic parallelization.
xlf_r -qsmp -qreport ...
The automatic parallelization performed by the compiler is of limited utility,
however. Performance may increase little or may even decrease. Another
option is explicit parallelization using OpenMP directives. By default,
the "-qsmp" option translates OpenMP directives and performs
automatic parallelization. To turn off automatic parallelization, use the
"-qsmp=noauto" option. For OpenMP programs compiled using "xlf_r",
you probably want to add the "-qnosave" option for independent
parallel calls to the same procedure.
xlf_r -qreport -qsmp=noauto -qnosave ...
In addition to OpenMP, XL Fortran directly supports POSIX Threads (Pthreads)
through a Fortran API created by IBM. For more information on using Pthreads
with XL Fortran, see the IBM Redbook POWER4
Processor Introduction and Tuning Guide or Scientific
Applications in RS/6000 SP Environments. The "..._r" compiler
commands automatically link with the Pthreads library.
The XL Fortran compiler supports hybrid parallelization with MPI and OpenMP
or Pthreads.
To compile a MPI+OpenMP or MPI+Pthreads code, use
mpxlf_r -qsmp ...
Everything stated in the shared-memory parallelization section still holds.
The "mpxlf_r" compiler automatically links
to the thread-safe MPI or LAPI libraries.
See the LoadLeveler page
for assistance on how
to run a hybrid code.
The following compiler options are useful for debugging executables.
| -g |
Includes debugger information in the object files. Allows a debugger
to associate machine code with source code. Works with all levels of
optimization! Note that the connection between source code and highly
optimized machine code may not be accurate. |
| -C |
Compiles with run-time array-bounds checking. |
| -qextchk |
Checks for mismatched procedure interfaces and common blocks. This
option cannot be used with MPI because MPI relies on weak type checking and
mismatched procedure interfaces. |
| -qflttrap |
Compiles the program to detect floating-point exceptions at run time.
The following form of this option causes the program to abort on floating-point
overflow or division by zero.
-qflttrap=overflow:zerodivide:enable
|
| -qsigtrap |
Installs the xl__trce trap handler. This can be used to get
a traceback when an exception condition is encountered. The default action
without this is to produce a core file. Alternative exception handlers
can be used, either provided in XL Fortran or supplied by the user.
Included exception handlers are
-qsigtrap=xl__ieee Produces a traceback and an explanation of the signal
and continues execution by supplying the default IEEE result for the failed
computation.
-qsigtrap=xl__trce (default) Produces a traceback and stops the
program.
-qsigtrap=xl__trcedump Produces a traceback and a core file and
stops the program.
-qsigtrap=xl__sigdump Provides a traceback that starts from the
point at which it is called and provides information about the signal. This
can only be called from inside a user-written signal handler.
-qsigtrap=xl__trbk Provides a traceback that starts from the
point at which it is called. It does not stop the program.
|
The following options are useful for creating performance profiles of
executables.
| -p |
Compiles the executable to produce limited performance-profile information.
When run, the executable writes performance data to the file "mon.out".
Use "prof" to analyze these data. |
| -pg |
Compiles the executable to produce extensive performance-profile information.
When run, the executable writes performance data to the file "gmon.out".
Use "gprof" or the gui-based tool "xprofiler" to analyze
these data. |
For more information on XL-Fortran compiler options, see "man xlf".
XL Fortran manuals are available online from IBM.
http://www-3.ibm.com/software/ad/fortran/xlfortran/library/
See "XL Fortran V8.1.1 for AIX" at the above URL to access the
user's guide and language reference.
For more information on XL-Fortran performance optimization and parallelization,
see the following IBM Redbook, available online.
POWER4 Processor Introduction and Tuning Guide
http://www.redbooks.ibm.com/abstracts/sg247041.html
phoenix
| ram
| cheetah
| eagle
|