next up previous contents
Next: GF11 Up: Scheduled Routing Architectures Previous: Scheduled Routing Architectures

iWarp

  The iWarp architecture [54,10,11] is CMU and Intel's follow-on project to the Warp architecture. iWarp integrates the processing and routing units on a single chip, targeted to DSP, scientific, and image processing. The interface between the processor and router is an interface register file used for systolic communication, similar to the processor interface described in this thesis for scheduled routing. Memory-to-memory communication is also supported.

iWarp supports both traditional dynamic routing as well as a higher-speed virtual channel technique, where channels are created for long periods of time and used to connect two nodes that may not be adjacent in the mesh. iWarp nodes have a $20\times 20$ communications crossbar used to route the channels in the mesh; virtual channels are pre-connected to an output channel, and dynamic messages use the usual contention mechanisms to bind to an output channel going the direction the header indicates they should go.

A virtual channel is created at runtime by sending a marker message which instructs each router as to whether the channel should turn in a given direction, or continue straight through (by default). Setting up the virtual channel is relatively slow; each node in the pathway takes four or five clock cycles to forward the setup token to the next node. However, once in place, messages can be transmitted over the channel at the full iWarp interconnect speed. Dynamic messages are treated similarly, but, essentially, tear down their channel behind them as they go.

iWarp channels include a limited `reprogrammable' aspect. Messages can be read into a node, then the router modified to forward the remainder of the message onto subsequent nodes. Similarly, a node can write out a message, then strip the header from an incoming message so as to leave the two messages merged as one.

Systolic communication was found to perform very well on applications for which it could be used [25]. In general, applications performed better when more systolic paths were established, since communication latencies were lower. iWarp's system support includes a number of compilers for C and FORTRAN, as well as compilers for image processing and the like.

Overall, iWarp is the existing system closest to the reprogrammable scheduled routing system targeted by COP in this thesis, and their results were very promising. NuMesh extends upon their work in several ways, as discussed more extensively in [56]; in particular, NuMesh allows faster communications, more precise control over stream bandwidths, and faster phase changing and other router modifications.


next up previous contents
Next: GF11 Up: Scheduled Routing Architectures Previous: Scheduled Routing Architectures
Back to Chris Metcalf's home page