|
|
||||||||||||||||||||||||||||||||||||||||||
| > home > support > FAQ > mem_man | |||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||
Using Memory Debugging Routines on HPCx |
|||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||
Q: How can I debug dynamic memory allocation problems ?
A: XL Fortran provides debug memory routines for XL Fortran which
can be linked with users' code. A script has also been developed to assist
the analysis of memory problems.
These routines can be very useful for diagnosing problems associated with allocating dynamic memory in application codes. In particular, memory leaks, out-of-bounds errors and reading/writing data to/from a freed object.
The library of most interest to users is libhmd.a.
This functionality can be accessed by linking in the libhmd.a
library prior to the system libraries. Calls to _dump_allocated() and _dump_allocated_delta() subroutines
from within the user's code prints information to stderr about each
memory block that is currently allocated or was allocated using the
debug memory management routines.
A perl script has been developed to help analyse the output to stderr
for Fortran90/95 codes that link libhmd.a.
Invoke the script using the command:
/usr/local/packages/bin/mem_man_sort.pl [ stderr.file ] [ source path ...(optional) ]
mem_man_sort.pl usage Example stderr file created by linking libhmd.a for F90 code:
0:1546-515 ----------------------------------------------------------------- 0:1546-516 START OF DUMP OF ALLOCATED MEMORY BLOCKS 0:1546-515 ----------------------------------------------------------------- 0:1546-518 Address: 0x302B0590 Size: 0x00000058 (88) 0:1546-527 This memory block was (re)allocated at 0: _debug_umalloc + 6C 0: _dbg_umalloc + 18 0: malloc + 38 0: readmcv + 2F0 [readmcv.f90:60] 0: mcv2pcv + 1430 [mcv2pcv_main.1.6.f90:79] 0:1546-515 ----------------------------------------------------------------- 0:1546-518 Address: 0x302B0600 Size: 0x0000002C (44) 0:1546-527 This memory block was (re)allocated at 0: _debug_umalloc + 6C 0: _dbg_umalloc + 18 0: malloc + 38 0: readmcv + 398 [readmcv.f90:61] 0: mcv2pcv + 1430 [mcv2pcv_main.1.6.f90:79] 0:1546-515 ----------------------------------------------------------------- 0:1546-518 Address: 0x302B0640 Size: 0x0000002C (44) 0:1546-527 This memory block was (re)allocated at 0: _debug_umalloc + 6C 0: _dbg_umalloc + 18 0: malloc + 38 0: readmcv + 434 [readmcv.f90:62] 0: mcv2pcv + 1430 [mcv2pcv_main.1.6.f90:79] 0:1546-515 ----------------------------------------------------------------- 0:1546-518 Address: 0x302B0680 Size: 0x0002A6E8 (173800) 0:1546-527 This memory block was (re)allocated at 0: _debug_umalloc + 6C 0: _dbg_umalloc + 18 0: malloc + 38 0: readmcv + 4F4 [readmcv.f90:63] 0: mcv2pcv + 1430 [mcv2pcv_main.1.6.f90:79] 0:1546-515 ----------------------------------------------------------------- .......
Organise the output using mem_man_sort.pl :
/usr/local/packages/bin/mem_man_sort.pl myprog.err ../src
Enter minimum size of array in bytes (default=1) : 32
Do you also require allocations ordered by size ? (enter y/n) (default=n) : y
Memory allocations (run-time order) :
********************************************
1. 88 bytes (Running Total 88) allocated in:
readmcv.f90@60: allocate(namep(npatch+1),STAT=err)
from mcv2pcv_main.1.6.f90@79: call readmcv(coordinateflag,fuelflag,exitflag)
2. 173800 bytes (Running Total 173976) allocated in:
readmcv.f90@63: allocate( xp((npatch+1),N_XL,N_XN),STAT=err) ; call err_test(err,'xp' )
from mcv2pcv_main.1.6.f90@79: call readmcv(coordinateflag,fuelflag,exitflag)
.....
41. 29627596 bytes (Running Total 1164295340) allocated in:
partition.f90@80: allocate(metis_cell(ncell),STAT=err) ; call err_test(err,'metis_cell')
from mcv2pcv_main.1.6.f90@108: call partition(mode)
Total Number of Bytes listed above = 1164295340
=========================================================
Total Number of Bytes Allocated in Program = 1164295340
=========================================================
=========================================================
Memory Allocations (ordered by size) :
****************************************
1. 207393172 bytes allocated in :
mcv2pcv_main.1.6.f90 : allocate(glb2loc_cell(ncell,7)
2. 177765576 bytes allocated in :
readmcv.f90 : allocate( net(ncell,6),STAT=err)
from mcv2pcv_main.1.6.f90 : call readmcv(coordinateflag,fuelflag,exitflag)
......
41. 44 bytes allocated in :
readmcv.f90 : allocate( nlp(npatch+1),STAT=err) ' )
from mcv2pcv_main.1.6.f90 : call readmcv(coordinateflag,fuelflag,exitflag)
Total Number of Bytes listed above = 1164295340
=========================================================
Total Number of Bytes Allocated in Program = 1164295340
=========================================================
=========================================================
The script can only correctly analyze output from a single process. To enable the analysis of parallel jobs, we suggest the following procedure:
In the loadleveler script set the environment variables
export MP_STDOUTMODE=ordered
export MP_LABELIO=yes
This will order and label output according to process id.
Then grep the resulting stderr file to extract process specific output
e.g. grep ' 0:' myprog.err > proc0.err
/usr/local/packages/bin/mem_man_sort.pl proc0.err ../src
to extract memory allocation information for process 0.| http://www.hpcx.ac.uk/support/FAQ/mem_man.html | contact email - www@hpcx.ac.uk | © UoE HPCX Ltd |