Chris Metcalf
Laboratory for Computer Science
Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
This paper attempts to answer the question, ``To what extent is prefetching effective in hiding memory latency, and what is the minimal amount of hardware required to support prefetching?'' We begin by providing a classification of the different kinds of prefetching, and reconciling the various common performance metrics to allow fair comparisons. We then put forward an analytical model that gives the potential speedup with prefetching. We next detail the non-binding software prefetch technique and examine its performance, both with hand-inserted and compiler-inserted prefetches. We consider an elaborate hardware scheme meant to replace the software schemes entirely; then look at more reasonable schemes requiring only minimal extra hardware, and assess how much they add to the simple software prefetching model. We conclude with recommendations for CPU/cache architects.