To run the parallel debugger (pdbx) the following steps should be carried out:
Compile a parallel program using the appropriate
mpxx_r compiler shell script
(e.g. mpcc_r, mpxlf90_r)
and specify the -g option.
Create a LoadLeveler job script for interactive use
Set-up any environment variables
Load the correct number of instances of the program into the parallel debugger on all LPARs allocated
Trace the program from within the debugger
The last two of these steps are described in more detail in the next section.
From the command line type:
pdbx ./prog.exe -llfile ./llscriptfile -procs Nwhere: prog.exe
llscriptfile
is the name of the file containing the
LoadLeveler keyword statements
-procs N
specifies the total number of instances of MPI tasks. N should
match the cpus-keyword in your LoadLeveler script file.
After initialisation the:
pdbx(all)prompt should be displayed.
tasks longat the
pdbx(all)-prompt,
to confirm all the instances are ready to be traced.
For example if 2 instances of the program were loaded the output should be something like:
0:Debug ready l1f35 172.31.6.137 0
1:Debug ready l1f35 172.31.6.137 0
stop at linenumwhere
linenum denotes the line number in the source code.
cont
quit
For more details on the use of the pdbx debugger refer to the IBM Parallel Environment for AIX - Operation and Use, Volume 2.
Totalview is a powerful sophisticated debugger which enables the debugging, analysis and tuning of serial and parallel programs. Currently, one may run Totalview interactively using up to 32 processors on HPCx.
Before running Totalview you must go through some setup stages which are detailed at:
http://www.hpcx.ac.uk/support/FAQ/totalview/
To start Totalview, type:
/usr/local/packages/totalview/tv6
Totalview has built-in documentation, and is more fully documented at:
http://www.etnus.com/Support/docs/index.html
VAMPIR (Visualisation and Analysis of MPI Resources) is a commercial post-mortem trace visualisation tool from Intel GmbH, Software & Solutions Group, the former Pallas HPC group. It uses the profiling extensions to MPI and permits analysis of the message events where data is transmitted between processors during execution of a parallel program. Event ordering, message lengths and times can all be analysed. The tool comes in two components - VampirTrace and Vampir. VampirTrace is a library which when linked and called from a parallel program, produces an event tracefile. The Vampir tool interprets the event tracefiles and represents the data in a graphical form for the user. he present
In order to run the Vampir/Vampirtrace tools, you will need to set your PAL_ROOT and PAL_LICENSEFILE and VT_ROOT environment variables. For example for a bash shell:
export PAL_ROOT=/usr/local/packages/vampir export VT_ROOT=/usr/local/packages/vampir export PAL_LICENSEFILE=/usr/local/packages/vampir/etc/license.datN.B. These environment variables will need to be set in any LoadLeveler batch scripts you use when you want to create a vampir tracefile. The present license works on up to 512 processors.
There are different versions of this libary for 32-bit addressing and 64-bit addressing. You have to specify the appropriate library path to pick up the correct version. You also need to add -lld library.
For 32-bit addressing:
mpxlf90_r -o hello hello.f -L${PAL_ROOT}/lib -lVT -lld
mpxlf90_r -q32 -o hello hello.f -L${PAL_ROOT}/lib -lVT -lld
For 64-bit addressing:
mpxlf90_r -q64 -o hello hello.f -L${PAL_ROOT}/lib64 -lVT -lld
Instrumented Tracing: By using keywords, section-specific information can be built into the trace using subroutine calls. Trace calls can be limited to time-critical program sections (in particular this can be useful to limit the size of the tracefile produced). In Fortran code, this involves adding calls to VTBEGIN and VTEND around the section of interest in the source. The equivalent in C is VT_begin and VT_end. You will also need to include a header file (VT.inc for Fortran and VT.h for C). The include path is
-I${PAL_ROOT}/include
The vampir executable lives at:
/usr/local/packages/vampir/binor
${PAL_ROOT}/bin
It can be useful to add this path to your PATH variable.
/usr/local/packages/vampir/doc/Vampir-userguide.pdf /usr/local/packages/vampir/doc/VT.pdf
Further information about Vampir/VampirTrace can be found at:
http://www.pallas.com/e/products/vampir/index.htm
http://www.hpcx.ac.uk/support/FAQ/paraver.html
gprof is a simple text-based utility that provides procedural-level profiling of serial and parallel codes. This helps users to identify how much time is being spent in subroutines and functions. xprofiler generates a graphical display of the performance, and provides application profiling at the source statement level.
Both gprof and xprofiler are very simple to use:
gprof exec_file gmon.out.pid
or
xprofiler
(after starting xprofiler use the File > Load Files dialogue box to direct it towards the executable file and the gmon.out.pid file.)
gprof and xprofiler facilitate analysis of CPU usage only. They cannot provide other types of profiling information, such as CPU idle, I/O or communication.
The latest version of xprofiler (from IBM research) is installed in
/usr/local/packages/actc/hpct/bin/xprofilerand offers additional features such as histograms when using profiling at the source line level.
NB. To include LAPACK in your profile, link using -L/usr/local/lib -llapack_profile.
Information about gprof and xprofiler can be found in the of the IBM Parallel Environment for AIX:
Operation and Use, Volume 2 at:
http://www-1.ibm.com/servers/eserver/pseries/library/sp_books/pe.html
The HPM toolkit consists of three components:
The hpmcount utility which may be used in conjunction with existing serial and parallel programs to provide information on execution time (wallclock time), hardware performance counters, hardware metrics and resource utilisation statistics.
The libhpm library and the hpmviz utility which must be used in conjunction with each other. The libhpm library is used to insert instrumenting code into different parts of the user program. At execution time, similar output to that of hpmcount is produced for each instrumented section. The output may be examined using the hpmviz utility.
The libhpm library may be used with serial and parallel (MPI, OMP) code written in C, C++ and Fortran.
Serial program usage:
hpmcount -o filename ./program
where program is the name of the executable and filename is the name of the file in which the output will be stored.
Parallel program usage:
Simply specify hpmcount in the LoadLeveler script file as follows:
poe hpmcount -o filename ./program
Pitfall: The following versions are wrong and will lead to hpmcount
investigating the performance of poe instead of your program:
hpmcount -o filename poe ./program
hpmcount -o filename ./program
For MPI and OpenMP programs respectively.
In the parallel case, filename is the prefix for a sequence of files (one for each processor or task) named filename_taskid.processid. Each of these files contains summary performance information for each processor.
In order to be able to use the hpmviz utility, the program must first be instrumented with calls to the appropriate libhpm routines. The full documentation should be studied, but briefly, for a free format Fortran 90 MPI source file mpiprog.f90:
insert the line:
#include "f_hpm.h"
immediately after your variables declaration.
Insert the lines:
call f_hpminit(rank, 'mpiprog')
call f_hpm_terminate(rank)
in the code once only (immediately after MPI initialisation, and immediately before exiting MPI, as shown). mpiprog is the name of the executable.
Instrument the relevant sections of code by bracketing them with the lines:
call f_hpmstart(no, 'name')
call f_hpmstop(no)
where no is a unique integer 0<no<100 and name is a name identifying the block being instrumented.
compile with:
$ mpxlf90_r -qsuffix=f=f90 -qsuffix=cpp=f90 -o mpiprog mpiprog.f90 -I/usr/pmapi/include -L/usr/pmapi/lib -lhpm -lpmapi
Note the -qsuffix=cpp=f90 flag which processes the header file included earlier.
Run the program in the usual way, after which a sequence of files with the suffix .viz will have been generated which can be examined with hpmviz. Each file will contain summary information about each instrumented section of the code.
In addition, files prefixed perfhpm will have been generated for each processor which contain information similar to that generated by the hpmcount utility.
To run hpmviz
$ usr/local/packages/actc/hpmtk/bin/hpmviz &
and load in the required .viz files. Left hand portion of the window will display the statistics for each block. Clicking the block name in the left hand side will highlight the block code in the right hand side of the window.
For OpenMP programs, use the thread-safe version of the library -lhpm_r and routines f_hpmtstart and f_hpmtstop.
For C and C++ programs, drop the f_ prefix.
A full guide on how to use the HPM toolkit on HPCx is available as
technical report HPCxTR0307:
http://www.hpcx.ac.uk/research/hpc/technical_reports/HPCxTR0307/index.html
The original IBM documentation will be found at:
http://www.hpcx.ac.uk/support/documentation/IBMdocuments/HPM.html
In order to gain detailed information about MPI communication times, three trace-wrapper libraries are provided:
To use mpitrace:
The number of output files can be reduced by setting the environment variable TRACE_SOME to "yes" or "1".
mpihpm provides the same information as mpitrace plus Power-4 hpm counter data.
mpiprof provides an elapsed-time profile of MPI routines including some call-graph information in order to identify communication time on a per-subroutine basis. To use mpiprof, the code must be compiled with -qtable=full or -g as an additional compiler option.
More information about the mpitrace, mpihpm and mpiprof libraries can be found at:
http://www.hpcx.ac.uk/support/documentation/IBMdocuments/mpitrace
KOJAK is a set of tools designed to analyse the performance of
and find bottlenecks in parallel applications. Information can be
gained about the communications and hardware events which occur
when the application is run. For full information, please see the
documentation and examples in the
/usr/local/packages/kojak/kojak directory and the web
page
http://icl.cs.utk.edu/kojak/.
To use on HPCx, first a few directives must be added to the source
code. At least
!POMP$ INST INIT
!POMP$ INST BEGIN(name)
for Fortran, or
#pragma pomp inst init
#pragma pomp inst begin(name)
for C are needed as the first executable statements of the main
routine and
!POMP$ INST END(name) (Fortran)
#pragma pomp inst end(name) (C)
are needed as the last
executable statements. name can be chosen as the name of the
main routine.
Additionally, the user can choose to instrument any additional subroutines or segments of code using the 'begin' and 'end' statements above to gain more clarity (about how events relate to the program structure) at the analysis stage.
Then, if working in 32-bit, add the location of the relevant executables to your path with
export PATH=$PATH:/usr/local/packages/kojak/kojak/bin/32If working in 64-bit replace the 32 above with 64.
The application should be built with the name of the compiler
preceded by kinst-pomp
kinst-pomp mpif90 -qsuffix=f=f90 source_code.f90 -o executable.x
The executable can then be run in the normal way resulting in the
trace file a.elg being created in the working directory, which
can then be automatically analysed with the command
kanal a.elgThis will analyse the trace file and display the results in the
cube browser, which is designed to make it easy to determine
performance properties of the application via hierarchical
(performance, call and system) trees. A colour coding system helps to
identify hot-spots. The manual can be downloaded from
http://icl.cs.utk.edu/kojak/cube.
ParaView is a parallel visualisation application capable of performing many different types of visualisations and supporting many different data types as input. ParaView runs in a client/server environment and has a python scripting interface for batch jobs.
Users can download a client binary to run on their local workstation from http://www.paraview.org/New/download.html. If a binary is not available for your particular platform, an X-windows client is available on HPCx. Run it using:
/usr/local/packages/paraview/launch_paraview_client
Prepare the client to accept a server connection by selecting ``File : Connect'' and entering login.hpcx.ac.uk as the host and ``Client / Server (reverse connection)'' as the server type. The default port is 11111 and can be left unchanged.
The ParaView server can be run in parallel with up to 32 processors on the interactive queue. Run it using:
/usr/local/packages/paraview/launch_paraview_server -l <loadleveller_script>
The full range of options can be accessed by passing -h to the script. Users wishing more control over the launching of the server should read the script and run the relevant executables directly.
An appropriate wall clock limit should be specified in the LoadLeveller script, as if the limit is reached, the server processes will be terminated immediately and any visualisation data that has not been saved to disk will be lost.
Users who are running the client behind a firewall are responsible for ensuring that the connection from login.hpcx.ac.uk is accepted. Alternatively, solutions such as VPN or port forwarding over SSH could be used.
For further information on server configuration visit http://paraview.org/Wiki/Setting_up_a_ParaView_Server
ParaView provides a Python scripting interface as an alternative to using the GUI. All ParaView functionality is accessed through the `servermanager' module, and help can be found by typing help(servermanager) in a python shell. Python scripts can be edited, run and tested in the interactive shell available in the GUI under ``Tools : Python Shell'' or by using the pvpython interpreter. Scripts can be executed without user interaction using pvbatch, either in serial:
/usr/local/packages/paraview/3.3.0/serial/bin/pvbatch --use-offscreen-rendering <path-to-python-script>
or parallel (via a loadleveller script):
poe /usr/local/packages/paraview/3.3.0/parallel/bin/pvbatch --use-offscreen-rendering <path-to-python-script>
Note that when using pvbatch, ParaView 3.3.0 or later must be used. Also, when running in parallel, the following error message will be displayed: ``vtkXOpenGLRenderWindow (1131a7750): bad X server connection. DISPLAY=''. This is normal and will be removed in a future release.
For further instructions on Python scripting for ParaView visit http://paraview.org/Wiki/images/f/f9/Servermanager2.pdf
TAU (Tuning and Analysis Utilities) is a portable profiling and tracing toolkit. It can profile parallel programs written in FORTRAN, C++ and C which use MPI.
It is installed in /usr/local/packages/tau and is available to all users. There are 32 and 64 bit versions installed under tau-2.17/ibm and tau-2.17/ibm64 respectively.
To use TAU simply add /usr/local/packages/tau/tau-2.17/ibm[64]/bin to your $PATH.
Below is a brief introduction to TAU on HPCx. For full instructions see http://www.cs.uoregon.edu/research/tau/tau-usersguide.pdf
The simplest way to get an application profile using TAU is to use the tau_poe wrapper when you run your MPI executable. Simply replace poe with tau_poe in your LoadLeveller script. When your application exits, one profile file for each process will be written in the current directory.
To view the profile run paraprof <path_to_profile_files>. ParaProf is TAU's profile viewing tool. It shows how much time each process has spend in different parts of the code. In this case it will record only the time spent in MPI calls and time spent in user code is bunched together as 'TAU Application'.
To get more information on your code, you can add instrumention calls to TAU to profile particular functions. TAU also provides a set of wrapper compiler scripts that will automatically instrument all your code using PDT (Program Database Toolkit). The wrapper compilers are called tau_f90.sh, tau_cc.sh, and tau_cxx.sh. They should be used in place of your FORTRAN, C, or C++ compilers in your own makefile. You must then export the TAU_MAKEFILE variable and recompile your code to instrument and link with TAU.
e.g.
export TAU_MAKEFILE=/usr/local/packages/tau/tau-2.17/ibm[64]/lib/Makefile.tau-mpi-pdt make clean; make; make install
Now run your application and you will again get one profile file for each process, however they will now contain data on all functions, not just MPI calls.
It is possible to generate even more detail by generating trace files instead of profiles. To do this set TAU_MAKEFILE to .../Makefile.tau-mpi-pdt-mpitrace or .../Makefile.tau-mpi-pdt-trace to generate traces of MPI function calls or all function calls respectively. WARNING: full trace files can be very large (1GB/min/process)!
TAU traces can be merged and converted for viewing in other tools e.g. VAMPIR, using the tau_treemerge.pl script and tau_convert tools. See the TAU user guide for more details.
The subversion executables can be found in /usr/local/packages/svn/subversion-1.5.6/bin.
In order to use Subversion on HPCx you need to have the following settings in your .subversion file:
NAMD can be run on HPCx and viewed and controlled from VMD running on your local machine, using a program called Proxycontrol for the connection, as shown below.
There is a video demonstration here IMDdemo
Enabling IMD in NAMD is strightforward, and a quick tutorial for IMD can be found online at http://www.ks.uiuc.edu/Research/vmd/imd/tutorial/.
Add these lines to the NAMD configuration file to enable IMD:
| IMDon | yes | ;# enable IMD |
| IMDport | 2030 | ;# NAMD socket port number, normally a number of (1024-65535) |
| IMDfreq | 1 | ;# NAMD data send out frequency |
| IMDwait | on | ;# blocking wait for VMD connection |
The underlined 2030 is an arbitrary port normally between 1024 and 65535, which will be used by the NAMD simulation for communication.
In the batch file, replace the command
/usr/bin/poe ./namd2 input.conf
with the command
proxycontrol -p 50000 -h 193.62.122.2 -f 65100:peer:2030 /usr/bin/poe ./namd2 input.conf
The underlined 2030 is the socket port NAMD will use, and 65100 is the port proxycontrol will open for VMD connection. Now IMD can be used by connecting VMD to HPCx on address login.hpcx.ac.uk and port 65100.
Start VMD and load a PDB file (or, if you prefer, both the PDB and PSF files) for the same system that is currently running in NAMD.
Open the "IMD" or "Simulations->IMD Connect" plugin from the VMD "Extensions" menu (depends on which version you're running), and enter the hostname of the computer running NAMD (login.hpcx.ac.uk) and the port that proxycontrol has opened (65100). Click "Connect". After a few seconds, you should see your molecule start to move: you are watching your simulation in real-time!