The objectives of this research are to develop portable parallel algorithms for VLSI CAD that will run on a wide variety of parallel machines and networks of workstations. We have developed the PROPERCAD2 object-oriented C++ runtime system to achieve this goal. We are developing parallel algorithms for cell placement, wire routing, and logic synthesis and verification under this task.
This project investigates two novel directions to the basic compilation framework for compiling for distributed memory message-passing multicomputers. The first concept is that of integrating the support of task and data parallelism to generate multiple program multiple data (MPMD) programs. The second key idea is to integrate the support of regular and irregular accesses in parallel applications. The ideas are being implemented in the PARADIGM compiler.
A suite of parallel applications will be developed which solve a number of significant problems in VLSI CAD. These applications will be implemented on top of a high-level, concurrent, object-oriented programming model that will be implemented as a programming environment for scalable HPCC platforms applicable to irregular and unstructured problems of which CAD problems are representative. The applications will be developed to a standard interface, implemented as a C++ class library.
UIUC is leading the effort within the framework of the
ProperCAD project, along with Sierra Vista Research, LSI Logic, and Cadence Design Systems.
We will investigate parallel algorithms for synthesis and testing. The parallel algorithms for synthesis will be based on the algorithms in the MIS and SIS set of tools.
We will also investigate parallel algorithms for various operations, such as node simplication, kernel extraction, node resubstitution, state assignment, state minimization, and logic verification using BDD operations. The parallel algorithms for testing will be based on the HITEC/PROOFs set of tools for sequential circuits. We are investigating search parallelism, fault parallelism, and circuit parallelism in our approaches.
The PARADIGM project is developing an automated means to convert serial programs and compiling them for efficient execution on distributed memory multicomputers. PARADIGM is a source-to-source compiler that converts a traditional sequential Fortran 77 or high-performance Fortran (HPF) program into an SPMD program with
message passing executable on distributed memory multi-
computers.
The PARADIGM compiler targets the following distributed memory multicomputers: the Intel iPSC/860, Intel Paragon, IBM SP-1 and SP-2, Thinking Machines CM5, and the network of workstations using PVM.
The objective of our unit is to develop methods for rapid failure analysis, diagnosis, and testing of large, low-power and high-speed microelectronic circuits. We will develop novel techniques for diagnosis and testing of nontraditional faults, as well as parallel algorithms for diagnosis and testing of very large circuits.
The purpose of this research is to develop and implement compiler-assisted memory management for recovery from failures through multiple instruction reexecution in highly parallel computer architectures using distributed shared memory. The goal is to facilitate rapid recovery from high rates of transient and intermittent failures through reissuing multiple instructions. The specific problems under investigation include compiler-assisted multiple instruction retry, address trace evaluation of cache-based error recovery, message scheduling for reducing rollback propagation, and relaxing memory consistency.
The focus of this research is on the development of architecture technology for rapid recovery from transient failures in computer systems. Of particular concern are microarchitectures with significant instruction-level parallelism. Specific applications under investigation include flight control for space and aircraft and high-speed network technology for intensive multimedia environments. Additional problems being addressed include the development of mechanisms for heterogeneous and portable checkpointing, memory management for rapid system reboot, and preemptive diagnostic algorithms for early detection and location of failures.
The objective of this project is to develop IMPACT, a new compiler foundation for high-performance parallel computer systems. IMPACT is needed because existing commerical compilers have been unable to absorb the advanced research results due to the limitations of their software architectures and their internal program representation and because there is a lack of effective compilers for parallel systems. Our key ideas are open software architecture, two-level intermediate program representation, profile-directed code optimization, and integrated parallel processing support. Code generators for Intel 80486 and 80860, MIPS R2000, SPARC, AMD 29000, and IBM RS/6000 are being developed to verify the effectiveness of IMPACT for a wide range of architectures.
The objective of this research is to derive effective code transformation and code generation methods for superscalar systems. The issues to be addressed are exploiting parallelism in highly optimized programs, exposing parallelism hidden in frequent control transfers, and achieving highly sustained performance by matching between the compiler and the hardware. We are constructing a superscalar code generator, the AMD 29K family, and integrating it into the University of Illinois IMPACT compiler. A large amount of production software will be used to evaluate the compiler's capability to close the gap between peak and sustained performance of superscalar systems.
The objective is to investigate the issues involved in extracting instruction level parallelism for achieving high performance in superscalar workstations. In particular, compiler optimization techniques are formulated to take advantage efficiently of limited predication and speculation hardware support. Detailed simulation is conducted to study the implication of predicated execution and speculation on the processor and memory system performance. Theoretical results are validated through the construc-
tion and evaluation of code generators and machine-
specific optimizers for current and future HP-PA superscalar implementations.
The objective of this research is to derive effective code transformation and code generation methods for superscalar systems. The issues addressed are exploiting parallelism in highly optimized programs, exposing parallelism hidden in frequent control transfers, and achieving high sustained performance by matching the compiler with the hardware. Theoretical results are validated through the construction of superscalar code generators, the AMD 29K and X86 families, and integrating them into the University of Illinois IMPACT compiler. A large number of production software will be used to evaluate the compiler's capability to close the gap between peak and sustained performance of superscalar systems.
The objective of this research is to derive effective compiler and architecture techniques required to achieve very high memory system performance at low manufacturing cost. The compiler techniques investigated are uni
modular loop transformations, loop blocking, data privatization, and data prefetching. The architecture techniques developed are block copy and prefetch. Theoretical results are validated by means of compiler prototyping and system simulation.
The objective of the project is to design a superscalar processor that takes full advantage of the IMPACT compiler technology. The key issue addressed is the engineering trade-offs involved in designing a processor with important hardware support and resources needed by the compiler to achieve high performance. The project involves both system simulation and real machine building. Performance data and engineering know-how derived in the project will be published to advance the industry's understanding of building superscalar computers.
Our objective is to derive effective approaches to integrate the compiler and the parallel microarchitecture techniques to achieve high performance. The reduced instruction set computer (RISC) researchers have pursued this using microarchitectures with modest parallelism. Our approach is to use highly parallel single-chip microarchitectures with multilevel caches emerging with the advances in fabrication technology. At the compiler level, we integrate register allocation/assignment and code-scheduling algorithms to exploit parallel microarchitectures. At the microarchitecture level, we investigate how various forms of parallelism (including pipelining and multiple instructions per cycle) can be used by the compiler to improve performance.
The objective of this unit is to design and develop high-speed VLSI chip architectures for supporting both general-purpose applications and regular recurrent computations. Our work on general-purpose architectures seeks to identify architectural techniques that best enhance the sustained performance of high-speed microprocessors for a wide range of applications. Our research on algorithm-specific architectures addresses the mapping of recurrent computations in signal and data processing applications to achieve an optimal trade-off between cost and performance in VLSI implementations.
The objective of this project is to conduct experimental research on the applicability of the IMPACT compiler technology on the current and coming generation of X86 superscalar computers. In particular, the issue of performing dependence removal optimizations and code scheduling given a small register file is addressed. Code transformations to improve the efficiency of the cache memory systems for high-speed microprocessors are also studied. Theoretical results are validated with real machine experiments conducted by developing the IMPACT/X86 compiler, compiling industry standard programs with IMPACT/X86, measuring the performance of output code on 486 and
Pentium computers, and evaluating output code performance on future X86 implementations using simulation.
The objective of the project is to develop IMPACT/VLIW, a new compiler foundation for high-performance VLIW computer systems. The project is motivated by the fact that existing commercial compilers have not been able to absorb the advanced research results due to the limitations of their software architectures and their internal program representation. It is also motivated by the lack of an effective compiler for parallel systems. The compiler technology studied under this project involves predicated compilation model and memory dependence handling. A large number of experimental computer architecture research projects are also being supported by IMPACT.
The objective is to provide architecture expertise and compiler prototypes required for the microprocessor industry to understand the cost and effectiveness of each level of hardware support. First, the design complexity of architecture support including silent instructions, sentinel hardware, conditional move instructions, conditional store instructions, and conditional execution of all instructions is studied. Second, compiler software is developed: if-conversion, reverse if-conversion, optimizers, and schedulers that become increasingly aggressive as the level of architecture support increases. Third, an integrated approach is defined to coordinate speculative execution and predicated execution to best improve program execution performance.
This research focuses on the architecture and compiler techniques required to close the gap between the peak performance and the sustained performance of high-performance multiprocessor systems. Numerous techniques have been proposed in the past but few have been proven effective for real programs. The major difficulty is that each technique has to be incorporated into a coherent system in order to demonstrate its effectiveness for real programs. This difficulty is addressed by IMPACT, our new compiler foundation for high-performance computer systems. The techniques currently under study are dependence analysis, loop level parallelization, loop granularity adjustment, and optimization of parallelized loops.
The Illinois Computing Laboratory for Aerospace Systems and Software focuses on four areas. Problems in the areas of reliable, fault-tolerant computing and parallel architectures include the study of reconfigurable architectures, automatic error detection and recovery in LISP systems, parallel architectures for computer-aided design systems, image understanding, and parallel artificial intelligence algorithms and software. Problems in the area of distributed systems include design of reliable, distributed database management systems, dynamic scheduling and load balancing, distributed data flow, and communication networks. Research in software engineering investigates software tools for the design, production, and reuse of aerospace applications and system software.
This project will investigate the development of automated design environments to validate performance and dependability of high-availability systems. Methods that allow detailed performance and reliability design via automatic fault injection and analysis will be investigated. Fault injection geared to evaluate the system under high-stress conditions will be developed. The analysis will be supported by various on-line dependability analysis tools that can quantify performability bottlenecks in the system. Graphical environments to display performance parameters, such as resource utilizations, queue length, and resource contention in an animated form, will also be developed to allow the designer to make effective reliability/speed trade-offs.
The diagnosis of a persistent problem that may surface intermittently is often difficult. For the diagnosis to be effective, it is imperative that the system be able to relate errors occurring in the different components/functions and at different times. The purpose of this study is to develop effective techniques for recognizing the symptoms of persistent problems and investigate the feasibility of this approach for failure/error prediction. The goal is to automate and formalize a process to relate errors occurring
in different parts of a system. Thus, given a problem we will determine whether it is an intermittent manifestation of a common fault or an isolated incident. The approach will be based on the recognition of error symptoms, the history of system behavior, and the level and type of system activity.
The research will investigate measurement-based techniques for quantifying the dependability of fault-tolerant systems. The effect of operating stresses on parallel machines, specifically the issue of how to identify a model structure from real data, will be addressed. The models developed will be used to quantify the loss in performance due to different failure types. The results are expected to have a twofold impact: the measurements on existing systems are expected to lead to model identification methods that will provide insight into the failure characterization and accurate analytical modeling of parallel systems; a realistic model formulation process is expected to give rise to new evaluation techniques.
There is a need for a high level of dependability in computer systems such as aircraft and aerospace systems, medical and automotive equipment, and high-speed network switching devices. This research will develop an integrated design framework in which developers of these systems can eliminate dependability risks early in the design process. Using relatively simple descriptions of the system's behavior, designers can test for dependability in a hierarchical manner, from the chip level to the system level, long before the system is built. This approach helps ensure the dependability of critical systems while reducing the time, effort, and cost of developing them. This is a joint project with Stanford University.
This research seeks to develop a simulation-based fault injection methodology that will provide accurate failure behavior data to the designers of VLSI chip fault-handling mechanisms. Tools will be developed to study the effect of transient and permanent faults on chip operation while running realistic application programs. The tools will provide full control of the location and type of injected fault and the ability to monitor the behavior of internal modules and on-chip fault-handling mechanisms. This research will produce practical techniques and theoretical foundations for evaluating the sensitivity of various modules on a chip to faults, error latency, and the effectiveness of fault tolerance and recovery techniques.
The objective of this research is to develop tools and methodologies for design of VLSI systems for testability, reliability, and manufacturability. The complexity of VLSI systems has increased the need for the development of chip design methodologies that emphasize easily provable and manufacturable functionality, performance, and reliability. This program addresses a wide range of design issues, each dealing with various aspects of reliable VLSI design, including research in fault simulation, test generation, design and synthesis for testability, fault diagnosis, mixed analog/digital simulation, wear-out mechanisms, and reliability characterization of device failures.
We are developing software tools and libraries that will facilitate the implementation of computation-intensive applications on massively parallel processing systems (MPPs). These tools will enhance load balancing among processors and will be compatible with existing Fortran 90 compilers. We exploit the cellular and iterative nature of computational fluid dynamics (CFD) applications. Our research involves developing (1) data-structures and migration paradigms more suited for migration on MPPs and (2) improved load-balancing heuristics for mesh-connected MPPs.
We have developed a new probabilistic measure called probability of win, which can handle changes in distribution of performance values from one problem domain to another. Our results have been implemented in TEACHER, a prototype learning system under development for the past few years. We have applied our learning system to learn better heuristics for (1) process placement on massively parallel processing systems, (2) branch-and-bound search in solving combinatorial search problems, (3) load balancing in networks of workstations, (4) designing more robust blind equalizers in cellular communication, (5) tuning parameters in VLSI circuit testing, placement, and routing, (6) designing artificial neural networks automatically, and (7) tuning parameters in stereo vision.
We utilize emerging VLSI technologies that allow tens to hundreds of vector pipelines to be implemented in one chip. Instead of competing for precious area in the same chip as regular instruction-set architectures, we design and evaluate a supervector processor in a separate chip. This is feasible as vector instructions, once initiated, can continue to execute until completion without close supervision by the instruction-set architecture. We are studying three interrelated issues: (1) architecture of coprocessor, (2) software for exploiting parallelism, and (3) system simulation and evaluation. Our design will allow computation-intensive loops to be executed at a rate far exceeding that provided by current co-processors.
In this project, we develop a novel method called the trace method (TRACE) for solving continuous and discrete global optimization problems. These problems are characterized by a nonlinear objective function, with or without a collection of nonlinear constraints. Such problems exist in many engineering applications that include operations research, signal processing, and function optimization. TRACE addresses the balance between global search and local search, using a trace to aid in identifying promising regions before committing to local searches. We have applied TRACE to find significantly better solutions than existing ones in filter-bank design, neural network learning, and constraint-satisfaction problems in operations research and combinatorial optimization.