The HPCx system consists out of 160 IBM eServer 575 LPARs for the compute and 8 IBM eServer 575 LPARs for login and disk I/O. Each eServer LPAR contains 16 processors, the maximum allowable by the hardware. The service offers a total of 2560 processors for computations.
Throughout this document we also use the names logical partition (LPAR) and frame as a synonym for the compute node.
The eServer 575 compute nodes utilise IBM Power5 processors. The Power5 is a 64-bit RISC processor implementing the PowerPC instruction set architecture. It has a 1.5 GHz clock rate, and has a 5-way super-scalar architecture with a 20 cycle pipeline. There are two floating point multiply-add units each of which can deliver one result per clock cycle, giving a theoretical peak performance of 6.0 Gflop/s. There is one divide and one square root unit, which are not pipelined.
The processor has 120 integer and 120 floating-point registers. There is extensive hardware support for branch prediction, and both out-of-order and speculative execution of instructions. There is a hardware prefetch facility: loads to successive cache lines trigger prefetching into the level 1 cache. Up to 8 prefetch streams can be active concurrently.
The level 1 cache is split into a 32 Kbyte data cache and a 64 Kbyte instruction cache. The level 1 data cache has 128-byte lines, is 2-way set associative and write-through.
Each chip contains two processors with their own level 1 caches and a shared level 2 cache. The level 2 cache is a 1.9 Mbyte combined data and instruction cache, with 128 byte lines and is 10-way set associative and write-back.
New to the Power5 is simultaneous multi-threading, or SMT. With SMT, each processor can support two instruction streams, allowing the simultaneous execution of two threads. These streams appear as logical processors, two per physical processor, four per chip. However, these logical processors share the physical processor's level 1 instruction and data cache, floating-point, and other functional units. Enabling SMT will also cause the level 2 and 3 caches to be shared among the four logical processors. SMT is now user configurable, see Section 8 for more details.
Each chip is packaged, together with a level 3 cache, into a Dual-Core Module (DCM). The level 3 cache is 36 MBytes and is shared between the 2 processors, equivalent to 18 Mbytes per processor. It has 256 byte lines, and is 12-way set associative and write-back.
Each eServer node contains 8 DCMs (16 processors) and has 32 Gbytes of main memory.
Inter node communication is provided by an IBM's High Performance Switch (HPS), also known as ``Federation.'' Each eServer node (frame) has two network adapters and there are two links per adapter, making a total of four links between each of the frames and the switch network.