|
|
||||||||||||||||||||||||||||||||||||||||||
| > home > support > FAQ > totalview | |||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||
Running the TotalView Debugger on HPCx |
|||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||
This page describes how to launch TotalView on HPCx. For details of how to use the package once it is running then either consult the online help, or see the documentation at http://www.etnus.com/Support/docs/.
All executables are located in the directory
/usr/local/packages/totalview/. It is probably best to add
this directory to your default PATH. TotalView is an X-based GUI tool,
and here we assume that you can already run graphical applications from
HPCx. To test this, type xterm at the normal HPCx prompt
and a terminal window should appear on your screen. If this does not
work then consult the section on running graphical applications via ssh.
tv command.
For more information on how to ensure your code dumps a core when it
encounters numerical problems see the FAQ entry on program
crashes.
runtv command. The job actually runs within the normal
interactive queues, so you also require an interactive LoadLeveler
script. Here we use the examples from the standard templates.
The syntax is runtv llfile executable [TotalView options], eg:
user@l1f01$ runtv intmpihello.ll mpihelloAny command-line options for the executable are entered at a later stage.
On the current Phase 2A system it is necessary to
set up ssh agent forwarding to enable interactive
Totalview debugging.
The instructions for setting up ssh agent forwarding can be found
by consulting the section
logging
in without a password.
The main TotalView window should then appear on your screen (unless
sufficient interactive resources are not available, in which case
runtv will report an error). The issue now is that
TotalView is actually debugging the poe program that
launches parallel programs rather than the parallel program itself. To
connect to your own application you should press the Go button
and answer Yes when asked whether you wish to stop
poe. If you have asked for n processes the root TotalView
window will actually show n+1 processes. The first is the original poe
process and should be ignored.
If there are insufficient resources at this stage (eg not enough free CPUs due to other users' interactive jobs) then your parallel job will not run, and an error message will appear in your terminal window.
TotalView will now be running under interactive Loadleveler control. Command-line parameters for the executable may be set at this stage by selecting the Arguments Tab within the Process => Startup Parameters dialog box. On exit, control returns to the login node.
Recent versions of Totalview feature memory debugging facilities. An explanation of how to enable Totalview-based memory debugging for your code on HPCx is provided here.
| http://www.hpcx.ac.uk/support/FAQ/totalview/ | contact email - www@hpcx.ac.uk | © UoE HPCX Ltd |