HPCx homepage
Services User support Projects Research About us Sitemap Search  
  line          
Helpdesk User Guide Notices Bulletin Documentation
Training Porting Filestore Batch Interactive FAQ
               
home > support > FAQ > mem_man
spacer
hr
spacer

Using Memory Debugging Routines on HPCx

spacer

Q: How can I debug dynamic memory allocation problems ?
A: XL Fortran provides debug memory routines for XL Fortran which can be linked with users' code. A script has also been developed to assist the analysis of memory problems.


The XL Fortran compiler contains two libraries that are geared to various memory-allocation facilities. These libraries include:

libhmd.a

A library, residing at /usr/lib, that provides debug versions of memory-management routines.

libhm.a

A non-debug library, residing at /usr/lib, that provides replacement routines for malloc, free, and so on. This library also contains a few new library routines to provide additional facilities for memory management and production-level heap error checking. For more detailed information on usage see http://publib.boulder.ibm.com/infocenter/comphelp/index.jsp?topic=/com.ibm.xlf91a.doc/xlfug/heapdbg.htm.

These routines can be very useful for diagnosing problems associated with allocating dynamic memory in application codes. In particular, memory leaks, out-of-bounds errors and reading/writing data to/from a freed object.

The library of most interest to users is libhmd.a. This functionality can be accessed by linking in the libhmd.a library prior to the system libraries. Calls to _dump_allocated() and _dump_allocated_delta() subroutines from within the user's code prints information to stderr about each memory block that is currently allocated or was allocated using the debug memory management routines. A perl script has been developed to help analyse the output to stderr for Fortran90/95 codes that link libhmd.a.

The features of the script are as follows:

Invoke the script using the command:

/usr/local/packages/bin/mem_man_sort.pl [ stderr.file ] [ source path ...(optional) ]

Example of mem_man_sort.pl usage

Example stderr file created by linking libhmd.a for F90 code:

   0:1546-515 -----------------------------------------------------------------
   0:1546-516                  START OF DUMP OF ALLOCATED MEMORY BLOCKS
   0:1546-515 -----------------------------------------------------------------
   0:1546-518 Address: 0x302B0590      Size: 0x00000058 (88)
   0:1546-527 This memory block was (re)allocated at
   0:               _debug_umalloc + 6C
   0:                 _dbg_umalloc + 18
   0:                       malloc + 38
   0:                      readmcv + 2F0          [readmcv.f90:60]
   0:                      mcv2pcv + 1430         [mcv2pcv_main.1.6.f90:79]
   0:1546-515 -----------------------------------------------------------------
   0:1546-518 Address: 0x302B0600      Size: 0x0000002C (44)
   0:1546-527 This memory block was (re)allocated at
   0:               _debug_umalloc + 6C
   0:                 _dbg_umalloc + 18
   0:                       malloc + 38
   0:                      readmcv + 398          [readmcv.f90:61]
   0:                      mcv2pcv + 1430         [mcv2pcv_main.1.6.f90:79]
   0:1546-515 -----------------------------------------------------------------
   0:1546-518 Address: 0x302B0640      Size: 0x0000002C (44)
   0:1546-527 This memory block was (re)allocated at
   0:               _debug_umalloc + 6C
   0:                 _dbg_umalloc + 18
   0:                       malloc + 38
   0:                      readmcv + 434          [readmcv.f90:62]
   0:                      mcv2pcv + 1430         [mcv2pcv_main.1.6.f90:79]
   0:1546-515 -----------------------------------------------------------------
   0:1546-518 Address: 0x302B0680      Size: 0x0002A6E8 (173800)
   0:1546-527 This memory block was (re)allocated at
   0:               _debug_umalloc + 6C
   0:                 _dbg_umalloc + 18
   0:                       malloc + 38
   0:                      readmcv + 4F4          [readmcv.f90:63]
   0:                      mcv2pcv + 1430         [mcv2pcv_main.1.6.f90:79]
   0:1546-515 -----------------------------------------------------------------

   .......

Organise the output using mem_man_sort.pl :

/usr/local/packages/bin/mem_man_sort.pl myprog.err ../src


 Enter minimum size of array in bytes (default=1) : 32
 Do you also require allocations ordered by size ? (enter y/n) (default=n) : y

 Memory allocations (run-time order) :
 ********************************************

1. 88 bytes (Running Total 88) allocated in:
      readmcv.f90@60:  allocate(namep(npatch+1),STAT=err)
     from mcv2pcv_main.1.6.f90@79:  call readmcv(coordinateflag,fuelflag,exitflag)

2. 173800 bytes (Running Total 173976) allocated in:
      readmcv.f90@63:  allocate(  xp((npatch+1),N_XL,N_XN),STAT=err) ; call err_test(err,'xp'   )
     from mcv2pcv_main.1.6.f90@79:  call readmcv(coordinateflag,fuelflag,exitflag)
.....

41. 29627596 bytes (Running Total 1164295340) allocated in:
      partition.f90@80:  allocate(metis_cell(ncell),STAT=err) ; call err_test(err,'metis_cell')
     from mcv2pcv_main.1.6.f90@108:  call partition(mode)

 Total Number of Bytes listed above = 1164295340

 =========================================================
 Total Number of Bytes Allocated in Program = 1164295340
 =========================================================
 =========================================================


  Memory Allocations (ordered by size) :
 ****************************************

 1. 207393172 bytes allocated in :
     mcv2pcv_main.1.6.f90 :  allocate(glb2loc_cell(ncell,7)

 2. 177765576 bytes allocated in :
     readmcv.f90 :  allocate( net(ncell,6),STAT=err)
     from mcv2pcv_main.1.6.f90 :  call readmcv(coordinateflag,fuelflag,exitflag)
......

 41. 44 bytes allocated in :
     readmcv.f90 :  allocate(  nlp(npatch+1),STAT=err)                                            '  )
     from mcv2pcv_main.1.6.f90 :  call readmcv(coordinateflag,fuelflag,exitflag)


 Total Number of Bytes listed above = 1164295340

 =========================================================
 Total Number of Bytes Allocated in Program = 1164295340
 =========================================================
 =========================================================

Note

The script can only correctly analyze output from a single process. To enable the analysis of parallel jobs, we suggest the following procedure:

In the loadleveler script set the environment variables


export MP_STDOUTMODE=ordered
export MP_LABELIO=yes

This will order and label output according to process id. Then grep the resulting stderr file to extract process specific output

e.g.  grep ' 0:' myprog.err > proc0.err
/usr/local/packages/bin/mem_man_sort.pl proc0.err ../src 
to extract memory allocation information for process 0.

spacer
hr
spacer
http://www.hpcx.ac.uk/support/FAQ/mem_man.html contact email - www@hpcx.ac.uk © UoE HPCX Ltd