In this series of lectures we present the specification, design and verification (property checking) of, TK, an asynchronous ARM-like microprocessor. The work is a development of a previous study of the AMULET1 chip designed by Steve Furber's team at Manchester University. AMULET1 is a 2-phase asynchronous, micropipelined version of ARM6, a best selling, compact processor suited for low power, embedded applications. (StrongArm, the DEC version, achieves 500 MIPS per watt.)
The structure of the presentation is as follows:
We start by covering the TK instruction classes: ADD, BAL, BNZ, LOD, NOP, STO, and SWP. Following ARM, LOD and STO may pre-or post-index. BAL and SWP (a test-and-set mechanism used to implement semaphores) also assign to two locations/registers. We then sketch an architecture to implement these instructions. Following StrongArm, we use separate memories for instructions and data, and following Machester's design of AMULET2 (and 3), we use a "fat" evaluation pipeline that calculates partial results in parallel.
We briefly review 2-phase asynchronous design and the CCS specification
notation. We then explain the workings and specifications of registers,
fifos, arithmetic pipelines, and parallel pipelines, and quote some provable
facts about their signal complexities.
The major portion of the presentation is concerned with the design,
specification, and verification of an asynchronous implementation of our
instruction set. TK is decomposed into three hardware processes which run
in parallel. The FETCH unit speculatively generates next instructions in
sequence (pc, pc+1, pc+2, ...) and forwards (pc, ir) pairs to the EVALuation
unit. EVAL decodes IR and (in our model) calculates a several values in
parallel to be handed on to the SAVE unit. (op, pc+1, A op B, D/B, A),
where op is the instruction code, pc+1 is required by BAL, A op B is the
"main" result, usually assigned to a register but which may also be the
destination of a branch instruction, or the index into memory for STO or
SWP, D/B gives the value to be assigned for STO and SWP, and A is required
for post-indexing. The SAVE unit has the responsibility for completing
the instruction by assigning a value to the PC register (BAL, BNZ), to
a register (ALU, BAL, LOD, SWP), or to the data memory (STO, SWP). In the
case of a branch instruction, SAVE must interrupt execution of FETCH, and
cause FETCH to generate instructions from a fresh location. In general
FETCH will have speculatively generated several now unwanted (pc, ir) pairs
which have to be discarded. We explain the delightful instruction colouring
technique used by Manchester and show how it can be implemented.
CCS, our specification notation, is object oriented, and list of standard parts and how they are wired together. we adopt the philosophy that well known standard parts should not be modelled in detail, and so use very abstract representations of arithmetic units, memory interfaces, and busses. Nonetheless we are still faced with a design of some 50 parts, each represented by a state machine with roughly 10 to 20 states. Significant modelling simplifications are achieved in three ways:
(i) Harvard architecture (separate memories)
(ii) using work by Liu (PhD Calgary, 1995) to simplify
the modelling of the register bank,
(iii) the pipeline theorems enable us to view the FETCH and EVAL
units in a dramatically simple
way when solving instruction discard.
We complete our preentation by sketching how our model was checked for deadlock, livelock, and certain safety and liveness properties.
Background material.
http://www.leeds.ac.uk/graham/research/asynch/asynch.html
contains down-loadable material on
CCS and property checking
(click on Papers)
modelling AMULET1
(click on Reports)
asynchronous design (title only!)
(click on Books)
For inspiration read Ivan Sutherland's Turing Award lecture and interview in CACM June 1990.