next up previous contents
Next: COP Language Design Issues Up: The Communications Language Previous: Extensibility

Example COP Code

  This section gives two moderate-sized examples of COP code; others are presented in Appendix A. No HLL code is shown; instead, the computational aspect of each example is outlined.

The first example is matrix multiply, as shown in Figure 2.6. The COP code implements a simple parallel matrix multiply of $A\times B$,where columns of the A matrix are broadcast from a different column of processors on each iteration, alternating with shifting portions of the B matrix vertically.


  
Figure 2.6: COP code for matrix multiplication
\begin{figure}
{\small \begin{verbatim}
(loop 'mul *xsize*
 (subset rows
 (broadcast 'b (runtime)))
 (subset cols
 (cshift 'c -1)))\end{verbatim} }\end{figure}

The variables rows and cols are assumed to hold the necessary subset specifiers; rows is a list of subsets, each subset corresponding to a group of nodes to handle computations for a single row, and similarly cols for columns in the matrix. (Code generated by a frontend compiler would typically have these values expanded in place.) The subsets might correspond to x and y cross-sections of a 2D mesh, or, e.g., Grey-code mappings on a 3D array.

As a larger example of how this language looks in practice, Figure 2.7 is an implementation of Gaussian elimination used to solve the matrix equation Ax = b . Due to the cyclic distribution of the standard Gaussian implementation, it is not necessary to worry about explicitly excluding rows that have already been eliminated, since a given processor will be involved in most operators at every step.


  
Figure 2.7: COP code for Gaussian elimination
\begin{figure}
{\small
\begin{verbatim}
; A 2D cyclic distribution of the matrix...
 ...bcast X(i) to col 0
 (broadcast 'b5 (runtime) :m 2)))\end{verbatim}}\end{figure}

In the figure, the r0 operator performs a reduction to get the index of the row with the largest i th element (the pivot row), along with the value of the element, and distributes it to all the nodes in each subset. The b1 operator broadcasts that index, along with the computed scale factor for the specific row, from the i th element of each row to the rest of the row. The s2 operator sends the i th row to replace the chosen pivot row, and the b3 operator then broadcasts the pivot row to all the other rows, so an update step can be performed. These four operations are repeated for each row in the matrix. Finally, the back substitution is done by looping back up through the rows; s4 sends the i th column to column zero (where the b vector is stored), and b5 broadcasts the value of xi to the entire column zero. Iterating through the matrix completes the back substitution, resulting in a solution for x .


next up previous contents
Next: COP Language Design Issues Up: The Communications Language Previous: Extensibility
Back to Chris Metcalf's home page