next up previous
Next: Libraries Up: User's Guide to the Previous: Interactive Execution



Running The Parallel Debugger pdbx

To run the parallel debugger (pdbx) the following steps should be carried out:

The last two of these steps are described in more detail in the next section.

Loading the parallel program

From the command line type:

 pdbx ./prog.exe -llfile ./llscriptfile  -procs N
where: prog.exe
is the name of the parallel program

is the name of the file containing the LoadLeveler keyword statements

-procs N
specifies the total number of instances of MPI tasks. N should match the cpus-keyword in your LoadLeveler script file.

After initialisation the:

prompt should be displayed.

Tracing the program instances in the debugger

For more details on the use of the pdbx debugger refer to the IBM Parallel Environment for AIX - Operation and Use, Volume 2.


Totalview is a powerful sophisticated debugger which enables the debugging, analysis and tuning of serial and parallel programs. Currently, one may run Totalview interactively using up to 32 processors on HPCx.

Before running Totalview you must go through some setup stages which are detailed at:

To start Totalview, type:


Totalview has built-in documentation, and is more fully documented at:


VAMPIR (Visualisation and Analysis of MPI Resources) is a commercial post-mortem trace visualisation tool from Intel GmbH, Software & Solutions Group, the former Pallas HPC group. It uses the profiling extensions to MPI and permits analysis of the message events where data is transmitted between processors during execution of a parallel program. Event ordering, message lengths and times can all be analysed. The tool comes in two components - VampirTrace and Vampir. VampirTrace is a library which when linked and called from a parallel program, produces an event tracefile. The Vampir tool interprets the event tracefiles and represents the data in a graphical form for the user. he present

In order to run the Vampir/Vampirtrace tools, you will need to set your PAL_ROOT and PAL_LICENSEFILE and VT_ROOT environment variables. For example for a bash shell:

  export PAL_ROOT=/usr/local/packages/vampir
  export VT_ROOT=/usr/local/packages/vampir
  export PAL_LICENSEFILE=/usr/local/packages/vampir/etc/license.dat
N.B. These environment variables will need to be set in any LoadLeveler batch scripts you use when you want to create a vampir tracefile. The present license works on up to 512 processors.

Tracing your Parallel Application with VampirTrace

Basic Usage: Using the basic functionality of VampirTrace for MPI is straightforward: relink your MPI application with the appropriate VampirTrace -lVT library, add the environment variables outlined above to your batch script and execute the application as usual.

There are different versions of this libary for 32-bit addressing and 64-bit addressing. You have to specify the appropriate library path to pick up the correct version. You also need to add -lld library.

For 32-bit addressing:

  mpxlf90_r -o hello hello.f -L${PAL_ROOT}/lib -lVT -lld
  mpxlf90_r -q32 -o hello hello.f -L${PAL_ROOT}/lib -lVT -lld
For 64-bit addressing:
  mpxlf90_r -q64 -o hello hello.f -L${PAL_ROOT}/lib64 -lVT -lld

Instrumented Tracing: By using keywords, section-specific information can be built into the trace using subroutine calls. Trace calls can be limited to time-critical program sections (in particular this can be useful to limit the size of the tracefile produced). In Fortran code, this involves adding calls to VTBEGIN and VTEND around the section of interest in the source. The equivalent in C is VT_begin and VT_end. You will also need to include a header file ( for Fortran and VT.h for C). The include path is


Viewing Your Tracefiles with Vampir

The output from VampirTrace files is viewed using Vampir. Simply run the Vampir executable to load the graphical viewer, and load your tracefiles through the menu system.

The vampir executable lives at:

It can be useful to add this path to your PATH variable.


Users of the service can access the documentation on the service machine

Further information about Vampir/VampirTrace can be found at:


Paraver is a performance visualisation and analysis tool that can be used to analyze MPI, OpenMP and Mixed-mode programs on HPCx. Hardware counter profiling is also included. A guide to setting up your jobs for Paraver profiling on HPCx can be found at:

gprof and xprofiler

gprof is a simple text-based utility that provides procedural-level profiling of serial and parallel codes. This helps users to identify how much time is being spent in subroutines and functions. xprofiler generates a graphical display of the performance, and provides application profiling at the source statement level.

Both gprof and xprofiler are very simple to use:

gprof and xprofiler facilitate analysis of CPU usage only. They cannot provide other types of profiling information, such as CPU idle, I/O or communication.

The latest version of xprofiler (from IBM research) is installed in

and offers additional features such as histograms when using profiling at the source line level.

NB. To include LAPACK in your profile, link using -L/usr/local/lib -llapack_profile.

Information about gprof and xprofiler can be found in the of the IBM Parallel Environment for AIX:

Operation and Use, Volume 2 at:

Hardware Performance Monitor (HPM) Toolkit

The HPM toolkit consists of three components:

The hpmcount utility

Serial program usage:

hpmcount -o filename ./program

where program is the name of the executable and filename is the name of the file in which the output will be stored.

Parallel program usage:
Simply specify hpmcount in the LoadLeveler script file as follows:

poe hpmcount -o filename ./program

Pitfall: The following versions are wrong and will lead to hpmcount investigating the performance of poe instead of your program:

hpmcount -o filename poe ./program
hpmcount -o filename ./program

For MPI and OpenMP programs respectively.

In the parallel case, filename is the prefix for a sequence of files (one for each processor or task) named filename_taskid.processid. Each of these files contains summary performance information for each processor.

The libhpm library and the hpmviz utility

In order to be able to use the hpmviz utility, the program must first be instrumented with calls to the appropriate libhpm routines. The full documentation should be studied, but briefly, for a free format Fortran 90 MPI source file mpiprog.f90:

For OpenMP programs, use the thread-safe version of the library -lhpm_r and routines f_hpmtstart and f_hpmtstop.

For C and C++ programs, drop the f_ prefix.

A full guide on how to use the HPM toolkit on HPCx is available as technical report HPCxTR0307:

The original IBM documentation will be found at:

MPI Trace Tools

In order to gain detailed information about MPI communication times, three trace-wrapper libraries are provided:

To use mpitrace:

The number of output files can be reduced by setting the environment variable TRACE_SOME to "yes" or "1".

mpihpm provides the same information as mpitrace plus Power-4 hpm counter data.

mpiprof provides an elapsed-time profile of MPI routines including some call-graph information in order to identify communication time on a per-subroutine basis. To use mpiprof, the code must be compiled with -qtable=full or -g as an additional compiler option.

More information about the mpitrace, mpihpm and mpiprof libraries can be found at:


KOJAK is a set of tools designed to analyse the performance of and find bottlenecks in parallel applications. Information can be gained about the communications and hardware events which occur when the application is run. For full information, please see the documentation and examples in the /usr/local/packages/kojak/kojak directory and the web page

To use on HPCx, first a few directives must be added to the source code. At least
for Fortran, or
#pragma pomp inst init
#pragma pomp inst begin(name)
for C are needed as the first executable statements of the main routine and
!POMP$ INST END(name) (Fortran)
#pragma pomp inst end(name) (C)
are needed as the last executable statements. name can be chosen as the name of the main routine.

Additionally, the user can choose to instrument any additional subroutines or segments of code using the 'begin' and 'end' statements above to gain more clarity (about how events relate to the program structure) at the analysis stage.

Then, if working in 32-bit, add the location of the relevant executables to your path with

export PATH=$PATH:/usr/local/packages/kojak/kojak/bin/32
If working in 64-bit replace the 32 above with 64.

The application should be built with the name of the compiler preceded by kinst-pomp

kinst-pomp mpif90 -qsuffix=f=f90 source_code.f90 -o executable.x

The executable can then be run in the normal way resulting in the trace file a.elg being created in the working directory, which can then be automatically analysed with the command

kanal a.elg
This will analyse the trace file and display the results in the cube browser, which is designed to make it easy to determine performance properties of the application via hierarchical (performance, call and system) trees. A colour coding system helps to identify hot-spots. The manual can be downloaded from


ParaView is a parallel visualisation application capable of performing many different types of visualisations and supporting many different data types as input. ParaView runs in a client/server environment and has a python scripting interface for batch jobs.

Launching the Client

Users can download a client binary to run on their local workstation from If a binary is not available for your particular platform, an X-windows client is available on HPCx. Run it using:


Prepare the client to accept a server connection by selecting ``File : Connect'' and entering as the host and ``Client / Server (reverse connection)'' as the server type. The default port is 11111 and can be left unchanged.

Launching the Server

The ParaView server can be run in parallel with up to 32 processors on the interactive queue. Run it using:

 -l <loadleveller_script>

The full range of options can be accessed by passing -h to the script. Users wishing more control over the launching of the server should read the script and run the relevant executables directly.

An appropriate wall clock limit should be specified in the LoadLeveller script, as if the limit is reached, the server processes will be terminated immediately and any visualisation data that has not been saved to disk will be lost.

Users who are running the client behind a firewall are responsible for ensuring that the connection from is accepted. Alternatively, solutions such as VPN or port forwarding over SSH could be used.

For further information on server configuration visit

Python scripting

ParaView provides a Python scripting interface as an alternative to using the GUI. All ParaView functionality is accessed through the `servermanager' module, and help can be found by typing help(servermanager) in a python shell. Python scripts can be edited, run and tested in the interactive shell available in the GUI under ``Tools : Python Shell'' or by using the pvpython interpreter. Scripts can be executed without user interaction using pvbatch, either in serial:

 --use-offscreen-rendering <path-to-python-script>

or parallel (via a loadleveller script):

poe /usr/local/packages/paraview/3.3.0/parallel/bin/pvbatch
 --use-offscreen-rendering <path-to-python-script>

Note that when using pvbatch, ParaView 3.3.0 or later must be used. Also, when running in parallel, the following error message will be displayed: ``vtkXOpenGLRenderWindow (1131a7750): bad X server connection. DISPLAY=''. This is normal and will be removed in a future release.

For further instructions on Python scripting for ParaView visit


TAU (Tuning and Analysis Utilities) is a portable profiling and tracing toolkit. It can profile parallel programs written in FORTRAN, C++ and C which use MPI.

It is installed in /usr/local/packages/tau and is available to all users. There are 32 and 64 bit versions installed under tau-2.17/ibm and tau-2.17/ibm64 respectively.

To use TAU simply add /usr/local/packages/tau/tau-2.17/ibm[64]/bin to your $PATH.

Below is a brief introduction to TAU on HPCx. For full instructions see

MPI Profiling

The simplest way to get an application profile using TAU is to use the tau_poe wrapper when you run your MPI executable. Simply replace poe with tau_poe in your LoadLeveller script. When your application exits, one profile file for each process will be written in the current directory.

To view the profile run paraprof <path_to_profile_files>. ParaProf is TAU's profile viewing tool. It shows how much time each process has spend in different parts of the code. In this case it will record only the time spent in MPI calls and time spent in user code is bunched together as 'TAU Application'.


To get more information on your code, you can add instrumention calls to TAU to profile particular functions. TAU also provides a set of wrapper compiler scripts that will automatically instrument all your code using PDT (Program Database Toolkit). The wrapper compilers are called,, and They should be used in place of your FORTRAN, C, or C++ compilers in your own makefile. You must then export the TAU_MAKEFILE variable and recompile your code to instrument and link with TAU.


export TAU_MAKEFILE=/usr/local/packages/tau/tau-2.17/ibm[64]/lib/Makefile.tau-mpi-pdt
make clean; make; make install

Now run your application and you will again get one profile file for each process, however they will now contain data on all functions, not just MPI calls.

It is possible to generate even more detail by generating trace files instead of profiles. To do this set TAU_MAKEFILE to .../Makefile.tau-mpi-pdt-mpitrace or .../Makefile.tau-mpi-pdt-trace to generate traces of MPI function calls or all function calls respectively. WARNING: full trace files can be very large (1GB/min/process)!

TAU traces can be merged and converted for viewing in other tools e.g. VAMPIR, using the script and tau_convert tools. See the TAU user guide for more details.

Subversion (SVN)

The subversion executables can be found in /usr/local/packages/svn/subversion-1.5.6/bin.

In order to use Subversion on HPCx you need to have the following settings in your .subversion file:

NAMD & VMD - Interactive Molecular Dynamics

NAMD is a parallel MD (Molecular Dynamics) code specialised for high performance simulation of large biomolecular systems. VMD (Visual Molecular Dynamics) is a molecular visualisation program for displaying, animating and analysing large biomolecular systems. IMD (Interactive MD) refers to using VMD and NAMD together by connecting VMD to NAMD, providing a method to run the MD simulation interactively.

NAMD can be run on HPCx and viewed and controlled from VMD running on your local machine, using a program called Proxycontrol for the connection, as shown below.

There is a video demonstration here IMDdemo

next up previous
Next: Libraries Up: User's Guide to the Previous: Interactive Execution
Andrew Turner