Al Davis received his bachelor's degree from MIT in electrical engineering
in 1969, and a Ph.D. in computer science from the University of Utah in
1972. His subsequent career has been split approximately equally
between academic and industrial research positions. His academic
positions have been at the University of Waterloo and two separate instances
at the University of Utah. He is currently a full professor at the
University of Utah. His industrial research positions have been at
the Burroughs Research Center in San Diego, the Fairchild AI Laboratory
which subsequently became the Schlumberger Palo Alto Research Center, and
at
Hewlett Packard Laboratories. In each of his industrial positions
he has led research teams that have designed and built novel parallel computer
prototype machines and the associated software systems.
Dr. Davis is well known for having built the first operational dataflow computer in 1976 (Burroughs), for his work in parallel computer architecture and systems, and as one of the pioneers in the field of asynchronous circuits and systems. His MEAT asynchronous VLSI CAD tool was the first fully automatic circuit synthesis system and led to the development of the more widely known Stetson system which incorporated the full range of commercial CAD capability (this work was done jointly with Professor David Dill of Stanford University). His work in high performance parallel computer communications work is also widely known. The Post Office chip that was part of the Mayfly machine developed at Hewlett Packard in 1990 is still the largest fully asynchronous ASIC and was also the first fully adaptive router chip.
Professor Davis currently leads two large computer architecture projects called Avalanche and Impulse and is also involved in a large parallel scientific computing study called C-Safe. The Avalanche project is an attempt create high performance parallel workstation clusters while minimizing the cost by maximizing the use of existing commercial workstation and interconnect fabric technologies. The particular choice are HP's Runway bus based servers and the Myrinet fabric. The Avalanche system provides support for both message passing and distributed shared memory (DSM) parallel processing. The architecture takes advantage of the high performance coherence bus to integrate message passing and DSM support directly into the memory hierarchy. Efficiency is further increased with a new network interface component called the Widget which directly interfaces the Runway and the Myrinet while providing hardware support for message passing and DSM protocols. Software overhead and CPU occupancy are further reduced with an improved set of message passing protocols called Direct Deposit that maintains the safety of socket based communication but minimizes context switch and copy overheads in a greatly reduced code path. DSM support provides for a flexible set of memory models and protocols to increase efficiency depending on application sharing patterns.
The Impulse project recognizes that while tremendous advances in processor architecture have been made, the architecture of memory systems has remained relatively static. In particular, the deepening cache hierarchy causes severe performance penalties for applications which do not exhibit temporal and spatial locality in their memory access patterns. The result is that important modern applications such as large sparse scientific codes, database and multimedia stream processing, and even the more traditional multi-strided dense matrix codes can only achieve a small fraction of peak performance on modern super-pipelined and superscalar architectures. System buses are tuned to deliver contiguous cache lines from the main memory to the processor. When the cache miss rates exceed a few percent the system bus becomes a severe performance bottleneck. The Impulse architecture involves the design of a novel memory controller which can be used with conventional processor and memory subsystems to significantly reduce this problem. The memory controller adaptively packs dense cache lines based on the application's stride rather than the default linear memory ordering. The memory controller also takes advantage of the high levels of internal parallelism and bandwidth in modern DRAM components by performing hot-row scheduling on the DRAM accesses. For example, in the NAS conjugate gradient benchmark we have shown a greater than 60% speedup in an early simulation study.
C-Safe stands for the Center for Simulation of Accidental Fires and Explosions. It is a large multi-disciplinary effort involved in creating accurate multi-specie chemical fire simulations and modeling their interaction with a container of HMX. This large simulation system will contain more than 10 million lines of parallel C and Fortran code. The target machines for this system are the IBM SP-2 and the SGI Origin 2000 supercomputers. More than 1,000 processors are required to achieve the Tera-op performance needed to run this code at reasonable performance. Presently most large scientific codes such as C-Safe run at around 1 to 20% efficiency. Professor Davis and his group are developing an advanced performance tuning toolkit which provides continuous performance profiling capability for all aspects of a specific architecture: issue-rate, commit-rate, miss-rate, fabric congestion, TLB shoot-down rate, etc. and integrates the results into a multi-dimensional visualization environment. The result is that application developers can actually see which pieces of their code is causing performance problems. The current prototype system runs on the Origin 2000 and efforts are underway to modify the system to work on the architecturally very different IBM SP-2.
Professor Davis has published more than 50 articles in journals and conferences, and holds 11 patents including the initial patents on dataflow architecture and autonomous interconnect routing chips. He has been a guest researcher at institutions in Israel, Germany, and the former USSR and has consulted for many of the major computer manufacturers.