next up previous
Next: Loading from memory, level Up: HPM event sets and Previous: Cycles, instructions, floats including


Cycles, instructions, TLB and level 1 data cache

Event set 56 gives access to the hardware counters for cycles, instructions, the translation lookaside buffer (TLB) and the level 1 data cache. Note the 64 kB per processor level 1 instruction cache is not covered by this event set.

Remark on level 1 data cache store misses: the level 1 data cache of HPCx has a store-through policy. Any data written to level 1 cache will immediately be written to level 2 cache as well. A level 1 data cache store miss will not establish the relevant cache line on the level 1 data cache. We expect level 1 data cache store misses to have at most a very minor impact on the performance. This remark also concerns some of the derived metrics in the table further down.

Raw counters
PM_DTLB_MISS Number of data TLB misses
PM_ITLB_MISS Number of instruction TLB misses
PM_LD_MISS_L1 level 1 data cache load misses
PM_ST_MISS_L1 level 1 data cache store misses
PM_CYC Number of processor cycles
PM_INST_CMPL Number of completed instructions
PM_ST_REF_L1 level 1 data cache store references
PM_LD_REF_L1 level 1 data cache load references

Remark on derived level2 metrics: Several of the names of the following derived metrics refer to the level 2 cache. However, these derived metrics are calculated from the level 1 load and store misses. On the HPCx system these numbers should be interpreted as level 1 misses and not as level 2 access as their name suggests. Comparing the level 1 data load misses (counter PM_LD_MISS_L1) to the actual load operations from cache level 2, level 3 and main memory (use event set 5), we observed between 0.5 and 10 times as many level 1 misses as there are load operations. Our investigations indicate data prefetching as a possible trigger for data loads, which have no corresponding level 1 miss. On the other hand the Power4 processor has two load/store units. Both units encountering a level 1 miss on the same level 2 or level 3 cache line, would explain those cases when we measured up to twice as many level 1 misses as data loads.

Derived Metrics
Utilization rate User time divided by wall-clock time in percent
% TLB misses per cycle Data TLB misses divided by the number of cycles
Avg number of loads per TLB miss
level 1 data cache load references divided by the number of data TLB misses
Total L2 data cache accesses
Sum of level 1 load and store misses
Please read the above remark!
% accesses from L2 per cycle
Level 1 load and store misses divided by cycle number
Please read the above remark!
L2 traffic
Level 1 load and store misses multiplied by the 128 byte cache line size
Please read the above remark!
L2 bandwidth
The previous line divided by the wall-clock time
For HPMCOUNT the wall-clock time may include considerable overheads.
Please read the above remark!
Load and store operations
Sum of the L1 data cache load and store references
Instructions per load/store
Number of completed instructions divided by the above.
Avg number of loads per load miss
Counter PM_LD_REF_L1 divided by counter PM_LD_MISS_L1
Avg number of store per store miss
Counter PM_ST_REF_L1 divided by counter PM_ST_MISS_L1
Avg number of load/stores per D1 miss
( PM_LD_REF_L1 + PM_ST_REF_L1)/( PM_LD_MISS_L1 + PM_ST_REF_L1)
L1 cache hit rate
Number of level 1 load and store references not resulting in a level 1 miss. Measured in % relative to the total level 1 load and store references.
MIPS
Completed instructions divided by wall-clock time in 1000000/s
Instructions per cycle
Completed instructions divided by number of cycles


next up previous
Next: Loading from memory, level Up: HPM event sets and Previous: Cycles, instructions, floats including
Joachim Hein
2003-11-03