The fundamental unit of scheduled routing is the virtual finite state machine, or VFSM. A VFSM handles moving a particular stream of data through a particular node. It includes some buffering for that particular stream of data as well as annotations (called the VFSM's source and destination) that indicate where its data come from and go to. Figure 1.5 shows a simple VFSM for a stream of data moving from node one through node two's router to node two's processor. Linking together all the individual VFSM sources and destinations on each node yields an overall VFSM chain that moves the data from a stream's original source to its final destination; in the given example, the stream's source might be node zero, and its destination is node two.
If it was only necessary to move one stream of data through each node, a single VFSM could be used as a router. However, it is much more common to need to move several unrelated streams of data through each processor. This can be accomplished by scheduling the VFSMs in some predefined sequence on a node, essentially time-slicing the router hardware into multiple virtual finite state machines (thus the `V' in `VFSM').
However, it is necessary to be able to differentiate two streams of data that just happen to pass between the same two nodes at some point. To do so, the VFSMs are scheduled globally, so that when a VFSM on one node does a write at a particular time, the VFSM that reads the same stream of data on the neighboring node is scheduled one clock cycle later. Figure 1.6 shows how VFSMs might be scheduled on three nodes (in a one-dimensional array), carrying three streams of data among them. Stream A, with bandwidth 0.5, carries data from nodes 0 to 2; streams B and C, with bandwidth 0.25, carry data from 0 to 1 and from 1 to 2, respectively. Each word of data moves through the mesh at one cycle per node; the number of times a given VFSM appears in a schedule corresponds to its allotted bandwidth.
Another representation of how scheduled routing is shown in Figure 1.7, which demonstrates which cycles data is carried across the wires in the mesh; router nodes are shown as circles, with their attached processors in boxes. In the example, a data word might start on node 0 on the VFSM for stream A when it's scheduled at time 0, get passed to node 1 at time 1, then passed to node 2 at time 2, where the last VFSM delivers data to the processor. A similar word leaving node 0 at time 1 must be associated with stream B, and will therefore be delivered by the VFSM for stream B on node 2 to the processor, instead of forwarded to node 3 as is true for stream A. Streams of data in a scheduled routing environment are thus bound tightly to a specific set of VFSMs on the nodes chosen to route them from their source to destination.
The nodes in the mesh are synchronized at boot time, and remain synchronized from then on. Each node runs its schedule repetitively, returning to the top of the schedule after running the VFSMs scheduled last in the schedule. Thus, if the schedule is of length k , the 3rd timeslot in the schedule will be run at time 3, 3+k, 3+2k , and so on.
The discussion so far has assumed that the physical hardware is capable of only a single communication action per cycle. Multiple physical FSMs can be included on each node that are each independently timesliced by VFSMs. These physical FSMs are called pipelines, using terminology that reflects the implementation of the current NuMesh hardware. Each pipeline is capable of scheduling an independent VFSM for each cycle. Thus, on a given cycle there may be a simultaneous transfer by one pipeline of data from the +x to the -x directions, while another pipeline is transferring data from the processor to the +y direction. Co-scheduled VFSMs must have non-conflicting sources and destinations to avoid conflicts.