FUNi is based on the Cyclone embedded processing system on a PCI card. In message passing, FUNi's embedded processor serves as a network coprocessor and manages an user-accessible message-passing interface in the host memory. User-level applications directly manipulate the interface location in host memory using cached reads and writes. Costly physical I/O accesses to device registers on the PCI bus are avoided. Currently, FUNi can efficiently support both fine-grain message passing and direct memory-to-memory transfers of large data blocks. FUNi can also support globally coherent shared memory by capturing and responding to memory accesses within a designated global address range. FUNi maintains a globally coherent shared memory cache to minimize global memory access latency. The necessary coherence protocol processing and communication is performed by the FUNi coprocessor.
We have demonstrated a two-node prototype of StarT-Jr and are awaiting fabrication of additional interface cards in order to assemble an eight-node system. StarT-Jr currently supports an active message-based light-weight communication library for the C programming language. Using the Arctic Network, preliminary measurements of the communication library demonstrated overheads of 1~3 usec for sending or receiving small (< 40 bytes) messages, and an user-to-user latency of less than 30 usec. Direct memory-to-memory transfers can sustain 6.5 MByte/sec on an unloaded network. With regard to the shared memory operation, a single-word read of a shared-memory location cached in FUNi takes approximately 2 usec.
StarT-Jr Poster at ASPLOS-VII (1996) NOW Workshop
See the StarT-Voyager page for other StarT related papers.
Keywords: StarT-Jr, FUNi, Arctic, IEEE 1394 High Performance Serial Bus, network of workstations, parallel processing, network interface, user-level, interprocessor communication