Tolerating Faults in Counting Networks
Marc D. Riedel and Jehoshua Bruck


Counting networks were proposed by Aspnes, Herlihy and Shavit  as a low-contention concurrent data structure for multiprocessor coordination. We address the issue of tolerating faults in counting networks. In our fault model, balancer objects experience responsive crash failures: they behave correctly until they fail, and thereafter they are inaccessible. We propose two methods for tolerating such faults. The first is based on a construction of a k-fault-tolerant balancer with 2(k+1) bits of memory. All balancers in a counting network are replaced by fault-tolerant ones. Thus, a counting network with depth O(log2 n), where n is the width, is transformed into a k-fault-tolerant counting network with depth O(k log2 n). We also consider the case where inaccessible balancers can be remapped to spare balancers. We present a bound on the error in the output token distribution of counting networks with remapped faulty balancers (a generalization of the error bound for sorting networks with faulty comparators presented by Yao .

Our second method for tolerating faults is based on the construction of a correction network. Given a token distribution with a bounded error, the correction network produces a token distribution that is smooth (i.e., the number of tokens on each output wire differs by at most one, a weaker condition than the step property of counting networks). The correction network is constructed with fault-tolerant balancers. It is appended to a counting network in which faulty balancers are remapped to spare balancers. In order to tolerate k faults, the correction network has depth 2k(k+1)(log n + 1), for a network of width n. Therefore, this method results in a network with a smaller depth provided that O(k) < O(log n). However, it is only applicable if it is possible to remap faulty balancers.


View Paper (PDF)

View Presentation (PowerPoint)