IBM C and C++ Compilers
Contents
The IBM C and C++ compiler is available on Cheetah. The compilers are the
"batch" compilers supplied with IBM's Visual Age C++, and they support
K&R and ANSI C and ANSI C++. OpenMP is supported for C compilation,
but not yet for C++.
Although there is only one actual compiler, you can invoke the compiler
using a variety of different commands. Each command has different default
options. Here are descriptions of some of these commands.
| xlc |
ANSI C with limited extensions. |
| cc |
Extended C, including K&R and ANSI. |
| xlC |
ANSI C++. |
| mpcc |
Like "xlc", but automatically links with MPI and LAPI libraries. |
| mpCC |
Like "xlC", but automatically links with MPI and LAPI libraries. |
| ..._r |
Add "_r" to the name of each compiler to compile threaded
codes and automatically link with the POSIX thread libraries (e.g. "xlc_r",
"mpCC_r"). OpenMP programs must use the "_r" compiler
commands. |
A few options control the way memory is used by C and C++ programs. These
options can be critical for large-memory applications. Here is a description
of some options for memory management.
| -q64 |
Creates an executable with a 64-bit address space. All object files
making up the executable must be compiled with "-q64". The Parallel
Environment is currently only available in 32-bit form, so 64-bit
MPI executables must be compiled with the thread-safe compiler mpcc_r.
64-bit object and executable files created on Eagle will not be compatible
with Cheetah and vice-versa while Eagle is running AIX4.3. When Eagle
is upgraded to AIX 5.1, then 64-bit objects will be compatible between
Cheetah and Eagle. |
| -bmaxdata |
By default, 32-bit executables only access one segment, or 256 MB,
of memory. By linking with "-bmaxdata", you can increase this
range up to eight segments, or 2 GB. Specifying "0x80000000" allows
the full 2 GB range.
This option does
not specify the size of memory the executable actually
uses, but the maximum amount it could possibly use. |
The following options provide a high level of optimization for C and C++
that is also safe.
cc -g -O3 -qmaxmem=-1 -qstrict ...
The "-g" tells the compiler to include information in the executable
to allow effective debugging. It doesn't inhibit optimization at all, so
we advise that you always include it.
The "-qmaxmem=-1" allows the compiler to use more memory for
space-intensive optimizations. (It has nothing to do with the amount of
memory used by the executable.)
Be removing the "-qstrict", you can allow for higher optimization,
but the order of arithmetic operations may be changed. This can lead to
mathematically equivalent but numerically different results.
For potentially higher performance for C++ codes, you can turn on inlining
using "-Q".
xlC -g -O3 -qmaxmem=-1 -qstrict -Q ...
For potentially higher performance for C codes, you may want to experiment
with higher levels of optimization. The following options provide "high-order
transformations", which help optimize loops. These options are not available
for C++.
cc -g -O4 -qnoipa -qmaxmem=-1 -qstrict ...
Again, you can leave off the "-qstrict" if you want to allow the
order of arithmetic operations to change. The "-O4" option includes
inter-procedural analysis (IPA), and we recommend turning it off using
"-qnoipa". For typical computational codes, we have found that
IPA increases compile time dramatically without significantly increasing
performance.
If you want to experiment with IPA, you could try the following. Again,
these options are not available for C++.
cc -g -O5 -qmaxmem=-1 ...
The only difference between "-O4" and "-O5" is the level
of IPA; "-O5" uses the highest (and most time consuming) level.
The IBM C compiler supports directive-based shared-memory parallelization.
It supports
OpenMP and platform-specific
IBM directives.
Use the "-qsmp" option to turn on shared-memory parallelization.
You must use the thread-safe compiler commands ("..._r") to use
"-qsmp".
The compiler will automatically parallelize "DO" loops and array
statements when it can determine that such parallelization is safe.
The automatic parallelization performed by the compiler is of limited utility,
however. Performance may increase little or may even decrease.
Another option is explicit parallelization using OpenMP directives.
By default, the "-qsmp" option translates OpenMP directives
and performs automatic parallelization. To turn off automatic parallelization,
use the "-qsmp=noauto" option. In addition, POSIX threads
(Pthreads) are supported. For more information on using Pthreads,
see the IBM Redbook
POWER4 Processor Introduction and Tuning
Guide or Scientific Applications in RS/6000 SP Environments.
The "..._r" compiler commands automatically link with the Pthreads
library.
The C compiler supports hybrid parallelization with MPI and
OpenMP or Pthreads. To compiler a MPI+OpenMP or MPI+Pthreads code,
use
mpcc_r -qsmp ...
Everything stated in the shared-memory parallelization section applies
except now the "mpcc_r" compiler must be used which automatically
links to the thread-safe MPI or LAPI libraries.
See the LoadLeveler page for assistance on how
to run a hybrid code.
The following compiler options are useful for debugging executables.
| -g |
Includes debugger information in the object files. Allows a debugger
to associate machine code with source code. Works with all levels of
optimization! Note that the connection between source code and highly
optimized machine code may not be accurate. |
| -qextchk |
Checks for mismatched function interfaces and external types. |
| -qinfo=all |
Produces additional warnings, similar to those generated by Lint. |
| -qcheck=all |
Compiles the program to check for "null" pointers, object bounds, and
division by zero at run time. |
| -qflttrap |
Compiles the program to detect floating-point exceptions at run time.
The following form of this option causes the program to abort on floating-point
overflow or division by zero.
-qflttrap=overflow:zerodivide:enable
|
The following options are useful for creating performance profiles of
executables.
| -p |
Compiles the executable to produce limited performance-profile information.
When run, the executable writes performance data to the file "mon.out".
Use "prof" to analyze these data. |
| -pg |
Compiles the executable to produce extensive performance-profile information.
When run, the executable writes performance data to the file "gmon.out".
Use "gprof" or the gui-based tool "xprofiler" to analyze
these data. |
For more information on compiler options, see "man cc". Full documentation
on Visual Age C++, the product that includes the IBM C and C++ compilers,
is available at the following URLs in PDF form.
Some useful PDF C++ Manuals can be found in
Standard C++ Library Reference ,  
VisualAge C++ for AIX - Compiler Reference,
and
C/C++ Language Reference.
For more information on performance optimization and parallelization,
see the following IBM Redbooks, available online.
phoenix
| ram
| cheetah
| eagle
|