.

Computing in the RAIN

RAIN: 1. Water falling in drops condensed from vapor in the atmosphere; 2. Reliable Array of Independent Nodes

This project has been developed in the EDEN laboratory, Paradise.

Computing and storage over distributed environments such as clusters of workstations and personal computers that are connected by local and wide area networks has very high potential, since it leverages existing hardware and software and enables affordable parallel and distributed applications. However, currently those computing, storage and communication environments are not widely integrated to support distributed applications, mainly due to the relatively low performance of inter-processor communications and the insufficient overall system reliability.

Reliability and communication are the focus areas of the RAIN project that has the goal of creating reliable distributed environments that are based on commercial computing, communication and storage technologies. This project is part of a collaboration with JPL that is focused on the development of future distributed computing systems for spaceborne missions.

While exploring into the depth of the theoretical aspect of the project, we have at the same time implemented our first demonstration.

It consists of ten Intel Pentium PC nodes each equipped with two Myrinet cards. The nodes are interconnected by four 8-way Myrinet switches.

Using this system we have created our first proof-of-concept RAIN application, a distributed video server that is capable of tolerating multiple node, link and switch faults. The distributed video server uses a novel redundant storage scheme called EVENODD. We have been also working on other RAIN applications, including a distributed checkpointing system and a distributed fault-tolerant web server called: SNOW.