next up previous contents
Next: Managing Multiple Phases Up: Online Scheduled Routing Previous: Resource Allocation

Future Extensions

A variety of extensions could be used to improve the performance of online routing over scheduled routing.

One extension not currently implemented in the COP compiler is to allow non-nearest-neighbor streams. One simple version of this would be express channels [19]. Some nodes would have an additional set of streams available to them beyond the basic nearest-neighbor sets; these streams would connect to other nodes distant in the mesh. This would decrease the latency experienced by non-local messages, though it would also decrease the bandwidth available on the other network streams, as well as increase the interface address pressure. The address pressure in this model can be decreased by skewing the set of nodes responsible for managing express channels in each direction, such that each node only has one in-express and one out-express (at most) in addition to its nearest-neighbor streams. Assuming that express channels are k links in length, using them would reduce the number of routing steps for a path of distance l from l to, on average, k/2+l/k ; however, the cost would be a bandwidth factor of two for co-scheduling the express links.

A similar approach might be to abandon the mesh network for online routing and overlay a virtual hypercube-style network on the mesh, again trading off some bandwidth for latency. Here, however, the number of streams rapidly exceeds the schedule RAM. For a 3D mesh of size N there are N virtual hypercube streams crossing the bisection, but only $N^{\frac{2}{3}}$ wires. This gives a $N^{\frac{1}{3}}$ slowdown, and more importantly bounds the size of mesh that can fit in a given schedule size. For example, with a maximum schedule length of 64, a full hypercube can only be done for meshes about $4 \times 4 \times 4$.However, a coarser-granularity hypercube with a nearest-neighbor step for final routing might still be of benefit.

Using fixed interface addresses for reading and writing maximizes bandwidth and minimizes latency for the online streams. However, particularly if using something like express channels, address interface pressure can be significant. A single address could be used to read all online messages, but as the messages could then be interleaved Every word would need to be tagged to dispatch it to the correct message, at some cost in network bandwidth as well processing overhead. Similarly, a single address could be used to write all messages, reprogramming the router to connect the address to a different VFSM each time the interrupt handler switched to writing a new stream. Both solutions have a place when online routing is infrequently used and the interface addresses are a scarce resource for other co-existent operators; however, implementing and assessing this is left for future work.

A shared-memory model would be easy to add to processor-based routing. Messages destined to a particular virtual interface address could be interpreted as memory-read requests. For such messages, the interrupt handler would not queue the message for later reading, but instead read the specified memory location(s) and return the results to the given processor with a return message. More sophisticated memory semantics (such as Fetch-and-Op) could also be included at relatively low incremental cost in the handler. This implementation of shared memory could easily be used under a software shared memory implementation such as CRL [31].

Online-routing hardware could also be included directly on the router. Online-routing timeslots would be scheduled just like any other data transfer. If a message were trying to go in the +x direction, the routing hardware would just wait until the VFSM responsible for moving data from the online-routing hardware to +x is scheduled, and then provides the appropriate word to the neighbor. Similarly, when the message has reached its final destination, once a VFSM to move data from the online-routing hardware to the processor interface were scheduled, the data would be written to the interface, with the online routing engine providing an interface address (perhaps a cycle early) as well as the data word. Internal queues would be necessary to allow messages to bypass one another when accessing the interface addresses. Additionally, if desired, a few wires could be added between nodes to hold a virtual lane indicator for routing between nodes.


next up previous contents
Next: Managing Multiple Phases Up: Online Scheduled Routing Previous: Resource Allocation
Back to Chris Metcalf's home page