next up previous
Next: Cycles, instructions, fixed point Up: HPM event sets and Previous: Loading from memory, level

Cycles, instructions, loads from level 3 and memory

Use Event set 58 to investigate counts of cycles, total instructions and loads from level 3 cache or memory. This event set lacks the counter for the level 2.75 cache, which is irrelevant on HPCx because of te use of LPARs, for a full analysis of the level 2 caches. Use event set 5 for an analysis of all level 2 caches. All raw counters of this event set have already been discussed in event set 60 or event set 5. Only the derived metrics ``% loads from L3 per cycle'' and ``% loads from memory per cycle'' are new here.

Nomenclature of cache levels and locations
level 2 level 2 cache on same chip as processor
level 2.5 level 2 cache on different chip but same MCM as processor
level 3 level 3 cache on same MCM
level 3.5
level 3 cache on different MCM, inaccessible on HPCx, outside LPAR

Raw counters
PM_DATA_FROM_MEM Load operations from main memory
PM_DATA_FROM_L3 Load operations from level 3 cache
PM_DATA_FROM_L35 Load operations from level 3.5 cache
PM_DATA_FROM_L2 Load operations form level 2 cache
PM_DATA_FROM_L25_SHR Load operations from level 2.5 cache in `read only' state
PM_DATA_FROM_L25_MOD Load operations from level 2.5 cache in `exclusive' state
PM_CYC Number of processor cycles
PM_INST_CMPL Number of completed instructions

Derived Metrics
Utilization rate User time divided by wall-clock time in percent
Total loads from L3 Sum of loads from L3 & L3.5 in 1000000 operations
% loads from L3 per cycle
The previous result divided by the number of cycles in percent
L3 load traffic
The above multiplied by a 128 byte cache line
cache line size of L2 relevant
L3 load bandwidth
The above divided by the wall-clock time
Includes many overheads for HPMCOUNT, more useful for LIBHPM
L3 load miss rate
Sum of loads from Memory
divided by the sum of loads from L3, L3.5 and Memory
% loads from memory per cycle
Load operations from main memory divided by the number of cycles in percent
Memory load traffic
Loads from memory times a 512 byte cache line
Cache line size of L3 is relevant
Memory load bandwidth
The above divided by the wall-clock time
Includes many overheads for HPMCOUNT, more useful for LIBHPM
MIPS Completed instructions divided by wall-clock time in 1000000/s
Instructions per cycle Completed instructions divided by number of cycles


next up previous
Next: Cycles, instructions, fixed point Up: HPM event sets and Previous: Loading from memory, level
Joachim Hein
2003-11-03