next up previous contents
Next: Meta-Implementations Up: Operator Implementations Previous: Approaches to Data Dependency


Implementation Types

  This section covers the several high-level ways that an implementation can be written for scheduled routing to provide the necessary functionality for an operator, and provides some insight into how the compiler can choose among them.

For primitive operator implementations (as opposed to those that are simply Lisp code), the implementation's interface to the compiler is specified entirely through a single structure that includes a set of functions for characterizing and including the implementation. Table 4.1 presents the important functions that make up the structure (also included in the structure, but not shown, are miscellaneous flags fields and some other book-keeping data).

Table 4.1: Interface functions for operator implementations

The one function in the table common to all implementations is the metric function. This function is called from the compiler with a pointer to an operator and a by-reference structure to be filled in with information on how this implementation will perform on the given operator. A metric function can simply return `false' if it can't handle the given operator; this may be for a reason as simple as that it is the wrong type of operator, or that the implementation can't handle run-time operand values, or for more complex reasons (e.g., a dense broadcast implementation would reject an operator with a sparse subset).

If the implementation can handle the given operator, it fills in the `implementation information' structure. This structure holds best guesses as to the time to load the operator; the time to reload it on subsequent passes if one argument or the other has changed; the startup time for a message; and the inter-word time for a message. It also holds the number of I/O words required to run to completion, the number of VFSMs predicted to be used in each pipeline, the number of interface registers, and some other flags. All this information is made available to the compiler's search engine for trying to determine the best implementations to pick.

The following subsections discuss the four main high-level techniques for implementing operators: as dynamic-routing, stream-alias, schedule-generator, or replacement implementations.

Dynamic-Routing Implementations

Dynamic-routing implementations correspond to normal primitive implementations on a dynamic router. Such an implementation defines a streams function in its interface, which when passed the operator returns an array of streams necessary to handle the communications for this implementation of that operator. For example, a simple stream operator with just one instance would return the single appropriate stream from this operator; a reduce operator running in parallel on n rows of an array would return n reduction trees' worth of streams. For a dynamic implementation, the streams are used to guess traffic rates, as well as for dependency analysis.

The streams in a dynamic implementation will be single-source, single-destination streams; however, the sources and destinations may be specified either as compile-time or as run-time with a range of possible values. (A generic `value' structure is used within the compiler to represent both compile-time and run-time values.)

Dynamic implementations can be used for a dynamic-router substrate, unlike stream-alias or schedule-generator implementations, which are specific to scheduled routing. They can also be used with scheduled routing, however, at a performance penalty; this is discussed in detail in Section 4.2.

Stream-Alias Implementations

The basic implementation type for scheduled routing is the stream-alias implementation. It also uses the streams function; the returned streams are used by the scheduled-routing backend to route actual streams, as well as to perform some dependency analysis. The streams are single-source but may be multiple-destination, implemented as a multicast in the router; however, the endpoints must be specified at compile-time so they can be routed at compile-time.

If the implementation needs to edit the schedule generated by its streams, it may include a gen_router function as part of its definition. This function is used to specify router-level information, above and beyond that given by the streams returned by the streams function. For a stream-alias implementation, a gen_router function typically isolates the portions of the router used by a given operator's streams, and modifies the router information (e.g., VFSM sources and destinations) as necessary. For example, a stream with a run-time source can be implemented with a tree of point-to-point streams converging on the compile-time destination. Once the streams are routed, the gen_router function can update the router information such that data is forwarded automatically from one stream to the next, taking data to the destination without requiring intervention from the processors.

In addition to this compile-time modification, a stream-alias implementation may be run-time modifiable. This is a powerful technique, since a stream that is modified at run-time can be routed along with streams from other operators in a single phase, but such a stream is more powerful than a simple point-to-point stream. These operators rely on the gen_load function specified in the implementation definition. This function supplies additional code to be executed by the load function returned by COP as part of the per-operator code. For fully compile-time operators no such function is specified, since no load-time changes to the router are necessary. The `Run-Time Specification' example of a run-time broadcast in Section 4.1.1 is an example of load-time modified stream-alias implementations: the gen_load function handles modifying the router to accept data from the local processor, or to forward data from its neighbors.

Schedule-Generator Implementations

The other fundamental implementation type for scheduled routing is the schedule-generator implementation. The implementation has no streams function, and instead relies solely on the gen_router function to generate the necessary VFSMs and schedules from scratch. These implementations cannot be scheduled in parallel with other implementations, since each schedule-generator implementation assumes it has complete control over the schedule of all the nodes in its phase.

Schedule-generators are useful for a variety of reasons.

All of the above implementations rely on the gen_io function to return the processor code necessary to make the implementations work. This function implements the read and write, or the func, functions that the processor calls. For simple operators like stream, these functions may just read or write to an interface register and return. More complex operators like reduce and prefix include code for the necessary semantics in gen_io.

Replacement Implementations

The final class of implementations consists of replacements. This class of implementation defines the replace function in the definition structure. When called, it returns an operator (or group of operators) that will carry out the function. For example, the allreduce operator is implemented this way by mapping it to reduce+broadcast.

A related operator `implementation' is the stub implementation. This is typically an implementation that is used on one of the operators returned by a previous replacement, and it calls the other operators in the replacement group. For example, a barrier operator may be implemented as an allreduce along with a piece of stub code that calls the allreduce func function with an arbitrary argument and discards the result.

next up previous contents
Next: Meta-Implementations Up: Operator Implementations Previous: Approaches to Data Dependency
Back to Chris Metcalf's home page