HPCx homepage
Services User support Projects Research About us Sitemap Search  
Helpdesk User Guide Notices Bulletin Documentation
Training Porting Filestore Batch Interactive FAQ
home > support > FAQ > totalview

Running the TotalView Debugger on HPCx

We have a licence for the TotalView debugger on HPCx, which allows two simultaneous users to run up to 32 parallel processes each. Although TotalView on HPCx is the same package as available on other HPC platforms, the way it is launched may differ from what you may have encountered elsewhere.

This page describes how to launch TotalView on HPCx. For details of how to use the package once it is running then either consult the online help, or see the documentation at http://www.etnus.com/Support/docs/.

All executables are located in the directory /usr/local/packages/totalview/. It is probably best to add this directory to your default PATH. TotalView is an X-based GUI tool, and here we assume that you can already run graphical applications from HPCx. To test this, type xterm at the normal HPCx prompt and a terminal window should appear on your screen. If this does not work then consult the section on running graphical applications via ssh.

Debugging sequential jobs and core files

To debug sequential jobs, or to examine core files after a serial or parallel program has crashed, you should use the tv command. For more information on how to ensure your code dumps a core when it encounters numerical problems see the FAQ entry on program crashes.

Debugging parallel jobs

To run TotalView interactively with a parallel program you must use the runtv command. The job actually runs within the normal interactive queues, so you also require an interactive LoadLeveler script. Here we use the examples from the standard templates. The syntax is runtv llfile executable [TotalView options], eg:
  user@l1f01$ runtv intmpihello.ll mpihello
Any command-line options for the executable are entered at a later stage.

On the current Phase 2A system it is necessary to set up ssh agent forwarding to enable interactive Totalview debugging. The instructions for setting up ssh agent forwarding can be found by consulting the section logging in without a password. The main TotalView window should then appear on your screen (unless sufficient interactive resources are not available, in which case runtv will report an error). The issue now is that TotalView is actually debugging the poe program that launches parallel programs rather than the parallel program itself. To connect to your own application you should press the Go button and answer Yes when asked whether you wish to stop poe. If you have asked for n processes the root TotalView window will actually show n+1 processes. The first is the original poe process and should be ignored.

If there are insufficient resources at this stage (eg not enough free CPUs due to other users' interactive jobs) then your parallel job will not run, and an error message will appear in your terminal window.

TotalView will now be running under interactive Loadleveler control. Command-line parameters for the executable may be set at this stage by selecting the Arguments Tab within the Process => Startup Parameters dialog box. On exit, control returns to the login node.

Memory Debugging Parallel Jobs

Recent versions of Totalview feature memory debugging facilities. An explanation of how to enable Totalview-based memory debugging for your code on HPCx is provided here.

http://www.hpcx.ac.uk/support/FAQ/totalview/ contact email - www@hpcx.ac.uk © UoE HPCX Ltd