Academic Collaborators

Jehoshua Bruck

Moshe Schwartz

Sidharth Jaggi

Michelle Effros

Vasken Bohossian

Robert McEliece

Mario Blaum

Michael Lustig

David Malah

Nimrod Peleg

Research Supported By

Powell Fellowship

National Science Foundation

Lee Center for Advanced Networking

Atwood Fellowship

Research Projects

Array codes for random and clustered failures

Observation 1: Device failures in storage arrays may occur in clusters, arising from the physical layout of the devices. Clustered failures are more common in catastrophic events, when multiple devices fail simultaneously.

Observation 2: Known array codes for multiple failure recovery do not take the significance of clustered failures into account. Consequently, they suffer shortcomings that make their implementation in real systems problematic. These include complex decoding, encoding and updates. The latter, high update complexity, is especially limiting, as it increases I/O access times even when no failure occurs.

Contribution: RC (Random Clustered) codes is a new family of array codes for recovering from 4 failures. They correct a 7/8 portion of 4-failure combinations, including all 4-failures that fall into at most 3 clusters. Comparing to an all 4-failure correcting code, they improve the update complexity by 28% and the decoding complexity by 25%, to essentially the complexity of 3-failure correcting codes.

To read more: Array codes for random and clustered erasures

Cyclic lowest-density MDS array codes

Observation : Low-density array-codes that are cyclic, offer benefits to their encoding, updates and decoding. In many applications, the storage array is distributively implemented on separate network nodes, in which case the cyclic symmetry of the codes simplifies the deployment of the storage system by allowing a uniform design for all nodes.

Contribution: Three new families of cyclic lowest-density MDS array-codes are constructed. They enjoy the same merits of known codes (optimal redundancy, optimal update complexity) topped by the cyclic property. These codes belong to a sub-class of cyclic array-codes, called systematically-cyclic array codes. This sub-class is shown to enjoy greater implementation benefits, compared to the general class of cyclic array codes.

To read more: Cyclic lowest density MDS array codes

Coding for Flash memories

Observation : Flash memory cells suffer unique errors when they grow in density and in read/write speeds. The way these errors are combated is either by adding safety margins in the design and operation to make errors extremely unlikely, or by applying an external error-correcting code, not specifically designed for the underlying errors. Both methods defy information theoretic fundamentals.

Contribution: The physical constraints and impairments suffered by Multi-Level Flash cells are analyzed and used to abstract a well motivated channel model. A multitude of these physical impairments are captured by a channel that imposes errors that have limited magnitudes in one dominant direction. Codes for such channels are constructed, including families of codes that are "perfect". To "close the loop", the performance of these codes is validated by implementing them on real floating-gate arrays.

To read more: Codes for Asymmetric Limited-Magnitude Errors with Application to Multi-Level Flash Memories

Low-complexity encoding for network codes

Observation : Algebraic network codes use finite-field arithmetic for the encoding operations. Network codes are jointly implemented by all nodes in the network, most of which have limited computing resources (e.g routers). When the coding block size is moderately large, performing finite-field multiplications requires a significant number of operations, that grows quadratically with the size of the block.

Contribution: A new distributed randomized coding scheme is proposed, called permute-and-add network codes. Under this scheme, finite-field multiplication is replaced by a random permutation of the bits in each block. For any network, this simple scheme is shown to work with high probability with only a negligible loss in the multicast rate.

To read more: Low complexity encoding for network codes

Network coding for non-uniform demands

Observation : For multicast networks, network coding is known to achieve capacity with only simple linear coding operations. However, for general demand networks, where each sink demands a different subset of the source information, no similar achievability characterization exists. It is also known that deciding whether a demand network is achievable is NP-hard.

Contribution: A new demand model is defined, called non-uniform demands. Under this model, sink nodes only specify the (potentially different) size of the demanded subset, without restricting the identity of these information packets. This model lies in between the multicast model and the general demand model. It is motivated by common network applications such as Fountain coded data and multiple-description codes. The results show that this model is strictly "easier" than general demands, by proving positive results that do not apply to the latter. Other results show that this model is strictly "harder" than multicast, by proving a hardness result on the code design problem.

To read more: Network coding for non-uniform demands

Reed-Solomon algebraic list decoding with improved average-case running-time

Observation : Interpolation algorithms used in algebraic list decoding of Reed-Solomon codes only optimize the worst-case decoding time, when the actual number of errors equals the correction capability of the code. In practical applications of Reed-Solomon codes, for most instances the actual number of errors is much smaller than the maximum correctable. Hence, improving over the worst-case for low error cases can significantly improve the average-case running-time of the decoder and increase the decoding rate supported by the decoder.

Contribution: The interpolation problem is analyzed for a varying number of actual instantaneous errors. Bounds on the number of interpolation coefficients are provided. A modified interpolation algorithm is proposed whose running time depends on the instantaneous number of errors and its performance is analyzed using these bounds. As part of the project, a Reed-Solomon list-decoder was implemented and can be run from the following link.

List Decoder for a Reed-Solomon code

To read more: On the average complexity of Reed-Solomon algebraic list decoders

Bounds on the list decodability of error-correcting codes over large alphabets

Observation : Having tight bounds on the list decodability of error-correcting codes is important, both for the theoretical study of list-decoders, and for analyzing their miscorrection probability when used in practice. Known bounds on the list size of general error-correcting codes give overestimates for moderately large alphabet sizes.

Contribution: For alphabet sizes greater than a relatively small known threshold (that depends on the code minimum distance and decoding radius), the best known closed-form bound on the list size is derived. It is further shown to be close to algebraic bounds for Reed-Solomon codes, hence implying that Reed-Solomon codes have no "special" list decodability properties compared to the most general codes (apart from the existence of a constructive way to list-decode them, of course).

To read more:
A combinatorial bound on the list size
Miscorrection probability beyond the minimum distance