.
RAIN: 1. Water falling in drops condensed from vapor in the atmosphere; 2. Reliable Array of Independent Nodes |
This project has been developed in the
EDEN laboratory, Paradise.
Computing and storage over distributed environments such as clusters of
workstations and personal computers that are connected by local and wide
area networks has very high potential, since it leverages existing hardware
and software and enables affordable parallel and distributed applications.
However, currently those computing, storage and communication environments
are not widely integrated to support distributed applications, mainly due
to the relatively low performance of inter-processor communications and the
insufficient overall system reliability.
Reliability and communication are
the focus areas of the RAIN project that has the goal of creating reliable
distributed environments that are based on commercial computing, communication
and storage technologies. This project is part of a collaboration with JPL
that is focused on the development of future distributed computing systems
for spaceborne missions.
While exploring into the depth of the theoretical aspect of the project,
we have at the same time implemented our first
demonstration.
It consists
of ten Intel Pentium PC nodes each equipped with two Myrinet cards. The
nodes are interconnected by four 8-way Myrinet switches.
Using this
system we have created our first proof-of-concept RAIN application, a
distributed video server that is capable of tolerating multiple node,
link and switch faults.
The distributed video server uses a novel redundant storage scheme called
EVENODD.
We have been also working on other RAIN applications,
including a distributed checkpointing system and a distributed fault-tolerant
web server called:
SNOW.