Abstract
Group communication significantly influences the performance of data parallel applications. Nevertheless, the important factor that influences the efficiency of group communication is often neglected: a larger communication idle time may occur when there is node contention and difference among message lengths during one particular communication step. Group communication scheduling has attracted more and more attentions. In previous works, researchers can’t completely avoid communication conflict or they only focus on some special cases. This paper is devoted to develop a universal and efficient scheduling strategy concerning with the situation where array distributions are block-cyclic. Base on the proof for the recursive theorems of communication table elements, this strategy generates a communication scheduling table so that each column is a permutation of receiving node number in each communication step. And the messages with the close size are put into a communication step as near as possible. This indicates that our strategy not only avoids inter-processor contention, but it also minimizes real communication cost in each communication step. Finally, experimental results show that our strategy has better performance than the general method and the implementation of all-to-all based scheduling, and greedy scheduling.