Abstract
Distributed switch architectures allow the partition of a switch fabric into smaller independent switches, which is a key advantage for building high capacity switching systems, such as terabit switches. The performance and the feasibility of a distributed switch architecture depend on two critical components: the design of the queueing structure and the load balancing algorithm. In this paper, we present a distributed switch architecture with a simple yet efficient queueing structure and load-balancing algorithm that can be easily implemented in a terabit switch fabric with OC-768 line rate. The queueing structure is based on distributed non-buffered input-queued crossbar switch elements with request-only virtual output queues. The distributed load-balancing algorithm dynamically balances workloads among the parallel switch elements by trying to equalize the length of request-only virtual output queues in each of the switch elements. We refer to our architecture as a distributed switch architecture (ADSA), which introduces little communication overhead and no throughput degradation. As a result, it enables non-blocking switching without the need for internal speed-up. The load-balancing algorithm can perform one load-balancing action in less than 10 ns, which is suitable for OC-768 line rates of 40 Gbps. We study the performance of the ADSA architecture by modeling it as discrete-time queues with uniform i.i.d. Bernoulli traffic. We use a combination of analytical and simulation approaches to show that the ADSA can be approximated as discrete-time Geom/G/P queues under both light and heavy loads. Then the Allen-Cunneen approximation formula is applied to derive the mean cell delay of the ADSA architecture as a function of the underlying crossbar scheduling algorithm, the number of parallel switch elements and the number of ports in the system.