NEC provided the most convincing proposal in the procurement process, where vendor bids were evaluated based on a collection of representative simulation applications (RWTH job mix) and economic viability. The latter was determined by total cost of ownership (TCO) metrics that include hardware acquisition costs, energy consumption, cooling costs and the job mix performance.
The heart of CLAIX-2018 (“Cluster Aix-la-Chapelle”) comprises approximately 1100 compute nodes, each equipped with two Intel Xeon Platinum 8160 CPUs, containing 24 cores at a nominal clock frequency of 2.1 GHz, and 192 GB of RAM. The frequency can be clocked up to 3.7 GHz (turbo mode) in compute phases that do not fully leverage available parallelism. CLAIX-2018 delivers a theoretical peak performance of 3.55 PFlops. In addition, 48 of the compute nodes are equipped with two NVIDIA Volta V100 GPUs, interconnected by the high-bandwidth, low-latency NVLink technology. These GPU nodes support the acceleration of suitable applications and the efficient execution of specific application stages such as data analysis. All nodes of CLAIX-2018 are connected by an Intel Omni-Path 100G network fabric.
For the simulation applications from the RWTH job mix, CLAIX-2018 provides a significant performance gain. In comparison to the previously installed system, the average performance per core increases by 30 % for the same data input sets.
With CLAIX-2018, a completely new high-performing Lustre-based HPC storage solution will be put into operation. It will also be accessible from CLAIX’s first phase using a specific network connection. This storage solution provides a capacity of 10 PB and a read/write bandwidth of 150 GB/s.
CLAIX-2018 will be installed in the new machine hall of the IT Center at RWTH Aachen University. This infrastructure supports free cooling, i.e., no energy is consumed for cooling the water from the compute racks, instead the produced heat is given off by cooling towers. CLAIX-2018 will enter the testing stage in November 2018 and be fully operational by December 2018. As part of JARA-HPC, computing time can already be applied for today.