The current implementation of COP relies on an interrupt-driven system that provides buffering in processor RAM for messages. A future implementation of NuMesh (or a similar system, such as RAW [3,67]) could include traditional online-routing functionality in hardware.
Online routing is handled by providing one-hop streams between all adjacent nodes involved. One VFSM reads an interface address dedicated to, e.g., sending data to +x , and routes the data out the +x port. The neighboring processor similarly schedules a VFSM to read data from its -x port and deliver it to an interface address dedicated to reading online messages from its -x neighbor.
The online-routing code manages a number of queues on each node. Interrupts are initially enabled for data arriving on each active input stream. As data arrives, the code parses a header word holding a node number, `destination queue' (discussed below), and length. A message buffer is allocated and queued for the appropriate output interface address, and interrupt notification for empty status on that address is requested. The input handler fills the message buffer at the same time that the output handler empties it. Messages are originated by creating a new message buffer with the passed data and placing it directly in the appropriate output queue. As each output queue empties the code disables interrupts for that output. Messages terminating at the node are placed in the specified destination queues; to read them, the application code calls a routine that waits for the message buffer to be completely filled in, then returns the data.
Each stream of an online operator is assigned a destination queue to deliver their messages to on the destination node. If memory, and header-word bits, are no object, each stream can simply have a distinct number, so that each queue is used only by a single operator. Thus, an operator can issue a read and be guaranteed to find only messages intended for it waiting on the queue it reads.
Given, however, that memory and header bits are both a resource that should be managed efficiently, the same techniques can be used here as are used in Section 7.2.2, assigning destination queue numbers in the same manner as dynamic-router destination interface addresses are assigned. This would then guarantee that that an application can read from a given local queue and read only the messages for the given operator. This analysis makes the scheduled-routing online code asymptotically faster, since reading requires only constant time once the data has arrived, regardless of the number of messages already queued for other operators.