StarT-Jr: A Parallel System from Commodity Technology
StarT-Jr is an experimental parallel system composed of a network of
personal computers (PCs). The system leverages the momentum of the
microprocessor and PC industries to achieve excellent single node
performance at a low cost. For parallel processing, StarT-Jr uses the
Flexible User-level Network Interface (FUNi) to provide low-overhead,
user-level interprocessor communication over either IEEE 1394 High
Performance Serial Busses or MIT's Arctic
Switch Fabric. This efficient message-passing mechanism enables
StarT-Jr to exploit fine-grained parallelism for good parallel
performance.

FUNi is based on the Cyclone
embedded processing system on a PCI card. In message
passing, FUNi's embedded processor serves as a network coprocessor and
manages an user-accessible message-passing interface in the host
memory. User-level applications directly manipulate the interface
location in host memory using cached reads and writes. Costly
physical I/O accesses to device registers on the PCI bus are avoided.
Currently, FUNi can efficiently support both fine-grain message
passing and direct memory-to-memory transfers of large data blocks.
FUNi can also support globally coherent shared memory by capturing and
responding to memory accesses within a designated global address
range. FUNi maintains a globally coherent shared memory cache to
minimize global memory access latency. The necessary coherence
protocol processing and communication is performed by the FUNi
coprocessor.
We have demonstrated a two-node prototype of StarT-Jr and are awaiting
fabrication of additional interface cards in order to assemble an
eight-node system. StarT-Jr currently supports an active
message-based light-weight communication library for the C programming
language. Using the Arctic Network, preliminary measurements of the
communication library demonstrated overheads of 1~3 usec for sending
or receiving small (< 40 bytes) messages, and an user-to-user latency
of less than 30 usec. Direct memory-to-memory transfers can sustain
6.5 MByte/sec on an unloaded network. With regard to the shared memory
operation, a single-word read of a shared-memory location cached in
FUNi takes approximately 2 usec.
What else can StarT-Jr do besides "Scientific Apps"?
We had demonstrated a two-node StarT-Jr with the 1394 Trade Association at
Fall Comdex'95. We put together an application that merged a live
video stream from a camera and a recorded video stream from the
harddisk. Live subjects were asked to stand in front of a blue
screen. StarT-Jr then filtered out the blue background and replaced
it with the recorded video to create the special effect.

StarT-Jr Poster at ASPLOS-VII (1996) NOW Workshop
StarT-Jr Related Publications:
Related Research in Parallel Architectures and Network of
Workstations
- Alewife, MIT
- Avalanche,
University of Utah
- Fast Messages, UIUC
- FLASH, Stanford
- Fugu, MIT
- Myrinet, Myricom
- NOW, Berkeley
- S3.mp, SUN
Microsystems
- SHRIMP, Princeton
- StarT-Voyager, MIT
- StarT-NG, MIT
- U-Net ATM
cluster, Cornell
See the WWW Computer
Architecture Home Page and Supercomputing
and Parallel Computing Research Groups for more complete listings on
research in parallel computation.
James C. Hoe
(jhoe+www at ece_cmu_edu) and
Mike Ehrlich, (mikee@abp.lcs.mit.edu)
Keywords: StarT-Jr, FUNi, Arctic, IEEE 1394 High Performance
Serial Bus, network of workstations, parallel processing, network
interface, user-level, interprocessor communication