Vector processors have long been an important computer architecture
for supercomputers and recently have come into their own as high
performance processors for multimedia and other commodity computing
applications. Vector processors offer several significant advantages
over conventional microprocessor architectures: the ability to
effectively utilize high bandwidth memory systems, relatively simple
implementations, and the ability to scale performance with advancing
processing technologies. In this talk I will describe the
architecture of the first single-chip vector microprocessor, T0. T0
is a single-chip implementation of a complete vector architecture
designed for multimedia, human-interface, neural network, and other
digital signal processing tasks. I will present results that show
that this class of processor delivers flexible, cost-effective,
high-performance computing as required by these applications.
One of the biggest performance challenges in computer systems today is
the speed mismatch between microprocessors and memory. Processing
developments (merged DRAM/logic processing) now underway will soon
make it possible for processors and memory to be merged onto a single
chip. These developments will narrow or altogether remove the
processor-memory performance gap and better utilize the phenomenal
number of transistors that can be placed on a a single chip. This
will enable a new class of devices, dubbed "Intelligent Memory"
(IRAM). In this talk I will outline the factors leading to emergence
of IRAM technology and work at Berkeley on novel computer
architectures that exploit the large local memory capacity, low
latency, and high bandwidth of on-chip DRAM.
The emergence of high capacity reconfigurable devices (FPGAs, for instance) is igniting a revolution in general-purpose processing using these devices. It is now possible to tailor and dedicate functional units and interconnect to take advantage of application dependent dataflow. Early research in this area of reconfigurable computing has shown encouraging result in a number of areas including crytopgraphy, signal processing and searching --- achieving 10-100x computation density and reduced latency over conventional processors. The key to their cost/performance advantage is that conventional processors are often limited by instruction bandwidth and execution restrictions or by an insufficient number or type of functional units. Reconfigurable logic exploits more program parallelism and by dedicating significantly less instruction memory and control per active computing element, reconfigurable devices achieve an order of magnitude improvement in functional density over microprocessors. At the same time this lower memory and control ratio allows reconfigurable devices to deploy active capacity at a finer grained level, allowing them to realize a higher yield of their raw capacity, sometimes as much as 10 times, than conventional processors. Several research projects underway are investigating the integration of processors and reconfigurable arrays. The reconfigurable array extends the usefulness and efficiency of the processor by providing the means to tailor its circuits for special tasks. The processor improves the efficiency of the reconfigurable array for general-purpose computation. In this talk I will describe the architecture of several new reconfigurable processors.