StarT-Jr: A Parallel System from Commodity Technology

StarT-Jr is an experimental parallel system composed of a network of personal computers (PCs). The system leverages the momentum of the microprocessor and PC industries to achieve excellent single node performance at a low cost. For parallel processing, StarT-Jr uses the Flexible User-level Network Interface (FUNi) to provide low-overhead, user-level interprocessor communication over either IEEE 1394 High Performance Serial Busses or MIT's Arctic Switch Fabric. This efficient message-passing mechanism enables StarT-Jr to exploit fine-grained parallelism for good parallel performance.

FUNi is based on the Cyclone embedded processing system on a PCI card. In message passing, FUNi's embedded processor serves as a network coprocessor and manages an user-accessible message-passing interface in the host memory. User-level applications directly manipulate the interface location in host memory using cached reads and writes. Costly physical I/O accesses to device registers on the PCI bus are avoided. Currently, FUNi can efficiently support both fine-grain message passing and direct memory-to-memory transfers of large data blocks. FUNi can also support globally coherent shared memory by capturing and responding to memory accesses within a designated global address range. FUNi maintains a globally coherent shared memory cache to minimize global memory access latency. The necessary coherence protocol processing and communication is performed by the FUNi coprocessor.

We have demonstrated a two-node prototype of StarT-Jr and are awaiting fabrication of additional interface cards in order to assemble an eight-node system. StarT-Jr currently supports an active message-based light-weight communication library for the C programming language. Using the Arctic Network, preliminary measurements of the communication library demonstrated overheads of 1~3 usec for sending or receiving small (< 40 bytes) messages, and an user-to-user latency of less than 30 usec. Direct memory-to-memory transfers can sustain 6.5 MByte/sec on an unloaded network. With regard to the shared memory operation, a single-word read of a shared-memory location cached in FUNi takes approximately 2 usec.

What else can StarT-Jr do besides "Scientific Apps"?

We had demonstrated a two-node StarT-Jr with the 1394 Trade Association at Fall Comdex'95. We put together an application that merged a live video stream from a camera and a recorded video stream from the harddisk. Live subjects were asked to stand in front of a blue screen. StarT-Jr then filtered out the blue background and replaced it with the recorded video to create the special effect.

StarT-Jr Poster at ASPLOS-VII (1996) NOW Workshop

StarT-Jr Related Publications:

Related Research in Parallel Architectures and Network of Workstations

See the WWW Computer Architecture Home Page and Supercomputing and Parallel Computing Research Groups for more complete listings on research in parallel computation.

James C. Hoe (jhoe+www at ece_cmu_edu) and Mike Ehrlich, (

Keywords: StarT-Jr, FUNi, Arctic, IEEE 1394 High Performance Serial Bus, network of workstations, parallel processing, network interface, user-level, interprocessor communication