
Research ProjectsArray codes for random and clustered failuresObservation 1: Device failures in storage arrays may occur in clusters, arising from the physical layout of the devices. Clustered failures are more common in catastrophic events, when multiple devices fail simultaneously. Observation 2: Known array codes for multiple failure recovery do not take the significance of clustered failures into account. Consequently, they suffer shortcomings that make their implementation in real systems problematic. These include complex decoding, encoding and updates. The latter, high update complexity, is especially limiting, as it increases I/O access times even when no failure occurs. Contribution: RC (Random Clustered) codes is a new family of array codes for recovering from 4 failures. They correct a 7/8 portion of 4failure combinations, including all 4failures that fall into at most 3 clusters. Comparing to an all 4failure correcting code, they improve the update complexity by 28% and the decoding complexity by 25%, to essentially the complexity of 3failure correcting codes. To read more: Array codes for random and clustered erasures Cyclic lowestdensity MDS array codesObservation : Lowdensity arraycodes that are cyclic, offer benefits to their encoding, updates and decoding. In many applications, the storage array is distributively implemented on separate network nodes, in which case the cyclic symmetry of the codes simplifies the deployment of the storage system by allowing a uniform design for all nodes. Contribution: Three new families of cyclic lowestdensity MDS arraycodes are constructed. They enjoy the same merits of known codes (optimal redundancy, optimal update complexity) topped by the cyclic property. These codes belong to a subclass of cyclic arraycodes, called systematicallycyclic array codes. This subclass is shown to enjoy greater implementation benefits, compared to the general class of cyclic array codes. To read more: Cyclic lowest density MDS array codes Coding for Flash memoriesObservation : Flash memory cells suffer unique errors when they grow in density and in read/write speeds. The way these errors are combated is either by adding safety margins in the design and operation to make errors extremely unlikely, or by applying an external errorcorrecting code, not specifically designed for the underlying errors. Both methods defy information theoretic fundamentals. Contribution: The physical constraints and impairments suffered by MultiLevel Flash cells are analyzed and used to abstract a well motivated channel model. A multitude of these physical impairments are captured by a channel that imposes errors that have limited magnitudes in one dominant direction. Codes for such channels are constructed, including families of codes that are "perfect". To "close the loop", the performance of these codes is validated by implementing them on real floatinggate arrays. To read more: Codes for Asymmetric LimitedMagnitude Errors with Application to MultiLevel Flash Memories Lowcomplexity encoding for network codesObservation : Algebraic network codes use finitefield arithmetic for the encoding operations. Network codes are jointly implemented by all nodes in the network, most of which have limited computing resources (e.g routers). When the coding block size is moderately large, performing finitefield multiplications requires a significant number of operations, that grows quadratically with the size of the block. Contribution: A new distributed randomized coding scheme is proposed, called permuteandadd network codes. Under this scheme, finitefield multiplication is replaced by a random permutation of the bits in each block. For any network, this simple scheme is shown to work with high probability with only a negligible loss in the multicast rate. To read more: Low complexity encoding for network codes Network coding for nonuniform demandsObservation : For multicast networks, network coding is known to achieve capacity with only simple linear coding operations. However, for general demand networks, where each sink demands a different subset of the source information, no similar achievability characterization exists. It is also known that deciding whether a demand network is achievable is NPhard. Contribution: A new demand model is defined, called nonuniform demands. Under this model, sink nodes only specify the (potentially different) size of the demanded subset, without restricting the identity of these information packets. This model lies in between the multicast model and the general demand model. It is motivated by common network applications such as Fountain coded data and multipledescription codes. The results show that this model is strictly "easier" than general demands, by proving positive results that do not apply to the latter. Other results show that this model is strictly "harder" than multicast, by proving a hardness result on the code design problem. To read more: Network coding for nonuniform demands ReedSolomon algebraic list decoding with improved averagecase runningtimeObservation : Interpolation algorithms used in algebraic list decoding of ReedSolomon codes only optimize the worstcase decoding time, when the actual number of errors equals the correction capability of the code. In practical applications of ReedSolomon codes, for most instances the actual number of errors is much smaller than the maximum correctable. Hence, improving over the worstcase for low error cases can significantly improve the averagecase runningtime of the decoder and increase the decoding rate supported by the decoder. Contribution: The interpolation problem is analyzed for a varying number of actual instantaneous errors. Bounds on the number of interpolation coefficients are provided. A modified interpolation algorithm is proposed whose running time depends on the instantaneous number of errors and its performance is analyzed using these bounds. As part of the project, a ReedSolomon listdecoder was implemented and can be run from the following link.

