High-Performance Interconnects, Symposium on
Download PDF

Abstract

The Cray X1 supercomputer is a distributed shared memory vector multiprocessor, scalable to 4096 processors and up to 65 terabytes of memory. The X1's hierarchical design uses the basic building block of the multi-streaming processor (MSP), which is capable of 12.8 GF/s for 64-bit operations. The distributed shared memory (DSM) of the X1 presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of one byte per floating point operation. Our results show that this high bandwidth and low latency for remote memory accesses translates into improved application performance on important applications, such as an Eulerian gyrokinetic-Maxwell solver. Furthermore, this architecture naturally supports programming models like the Cray shmem API, Unified Parallel C (UPC), and coarray FORTRAN (CAF), and it is imperative to select the appropriate models to exploit these features as our benchmarks demonstrate.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!