Counting networks were proposed by Aspnes, Herlihy and
Shavit as a low-contention concurrent data structure for
multiprocessor coordination. We address the issue of tolerating faults in
counting networks. In our fault model, balancer objects experience *
responsive crash failures*: they behave correctly until they fail, and
thereafter they are inaccessible. We propose two methods for tolerating
such faults. The first is based on a construction of a *k*-fault-tolerant
balancer with 2(*k+*1*)* bits of memory. All balancers in a
counting network are replaced by fault-tolerant ones. Thus, a counting
network with depth O(log^{2} *n*), where *n* is the width, is
transformed into a *k*-fault-tolerant counting network with depth O(*k* log^{2}
*n*). We also consider the case where inaccessible balancers
can be remapped to spare balancers. We present a bound on the error in the
output token distribution of counting networks with remapped faulty
balancers (a generalization of the error bound for sorting networks with
faulty comparators presented by Yao .
Our second method for tolerating faults is based on the
construction of a *correction network.* Given a token distribution
with a bounded error, the correction network produces a token distribution
that is smooth (i.e., the number of tokens on each output wire differs by
at most one, a weaker condition than the step property of counting
networks). The correction network is constructed with fault-tolerant
balancers. It is appended to a counting network in which faulty balancers
are remapped to spare balancers. In order to tolerate* k* faults, the
correction network has depth 2*k*(*k*+1)(log *n *+ 1), for
a network of width *n*. Therefore, this method results in a network
with a smaller depth provided that O(*k*) < O(log *n*). However,
it is only applicable if it is possible to remap faulty balancers. |