Abstract
Many scientific applications use MPI collective communications intensively. Therefore, efficient and scalable implementation of collective operations is critical to the performance of such applications running on clusters. Quadrics QsNet/sup II/ is a high-performance interconnect for clusters that implements some collectives at the Elan level. These collectives are directly used by their corresponding MPI collectives. Quadrics software supports point-to-point striping over multi-rail QsNet/sup II/ networks. However, multi-rail collectives have not been supported. In this work, we propose a number of RDMA-based multi-port collectives over multi-rail QsNet/sup II/ clusters directly at the Elan level. Our performance results indicate that the proposed multi-port gather gains an improvement of up to 6.35 for 1MB message over the native elan/spl I.bar/gather. The proposed multi-port all-to-all performs better than the native elan/spl I.bar/alltoall by a factor of 2.19 for 16KB message. Moreover, we have also proposed two algorithms for the scatter operation.