|
|
|
GALS @ ETHZ[Home] [About] [People] [Docs] [Links]
About GALSIntroduction to our GALS methodologySystems-on-a-chip (SoC) often require a multitude of modules running at different clock frequencies to be integrated on a common die. In strictly synchronous designs synchronizers are used between the clock domains to reduce the possibility of metastability or data corruption, whereas in self-timed designs data transfers at all levels of the system are controlled by handshake signals. The price to pay is a widespread control overhead which often increases die size significantly. Even worse, the well established synchronous design flows can't be used without large modifications and additional asynchronous tools. Globally-Asynchronous Locally-Synchronous (GALS) operation is a novel approach to VLSI systems. It employs a self-timed communication scheme on a coarse grained block level and combines the following features: GALS makes it possible to take advantage of the industry-standard synchronous design methodology within individual clock domains and of self-timed operation across clock boundaries. The self-timed approach does away with the need to time-align the operation of all modules within the framework of a common base clock period. Instead, each module is driven from a local pausable clock generator in its self-timed wrapper being controlled such as to prevent any timing violations from occurring within the locally synchronous island's data interface. The problem of metastability is thus addressed by the ability to pause the local clock when data and the sampling clock edge occur too close to each other. By this method metastability is prevented and not resolved, as with other well known solutions. No extra latency for synchronizers, FIFOs and alike is introduced and the local clock's frequency can be chosen to perfectly fit the needs of the particular module. GALS holds the promise for solving or avoiding a variety of problems that are bound to become more important with very deep sub-micron technologies and in view of the virtual component business. Self-timed WrapperThe Figure below depicts a block level schematic of a GALS module with its self-timed wrapper surrounding the locally synchronous island (LS island). The wrapper contains an arbitrary number of GALS ports, a local pausable clock generator, and test structures.
Figure 1: Self-timed wrapper The wrappers are built from a library of predefined wrapper elements. This modular approach makes the constructing of GALS circuits safe and relatively easy. We have developed a sufficient library of five partly parameterized wrapper elements, which are described in technology independent VHDL. A port controller is responsible for managing all data transfers on a particular port in a GALS system. It gets enabled by the LS Island and has to synchronize data transmission and local clock phases. In order to transmit data fast and efficiently, the port controllers need to act independent from the local clock signal. This is achieved by implementing them as asynchronous finite state machines (AFSM). We describe the behavior of the controller using extended burst mode description by Kenneth Y. Yun. One of the major advantages of the extended burst mode description is that it can directly be synthesized into a hazard-free implementation using the 3D synthesis tools for asynchronous control circuitry. 3D synthesis results in a set of equations: one for each output and one for each additional internal state variable (the state is held by a combinational feedback loop). These equations represent a two level And-Or implementation and can therefore easily be transformed into an electrical circuitry. To cover the diverse needs for intermodule communication we have defined two families of port controllers:
It issues requests for clock stretching exclusively to prevent metastability and thus ensures data correctness. The clock is influenced as little as possible. A P-port is appropriate wherever a data transfer is possible but does not necessarily need to happen immediately. The LS island continues to operate normally while the P-port handles the data transfer.
This type of port also ensures data integrity on the transfer channel but adds a feature similar to clock gating: as soon as it is enabled, it stops the local clock and does not release it until the required transfer has taken place. Therefore, a D-Port is used where a data transfer is immediately needed because the LS island can not carry out any useful computations without the data item being requested. While awaiting the pending exchange, the D-Port suspends the local clock, thereby effectively preventing any dynamic power dissipation. As soon as a new data item becomes available, the LS Island resumes operation directly in phase with the incoming data. Local Clock GeneratorThe pausable local clock generator shown below is based on a ring oscillator structure with a tunable delay line. To stretch the local clock lclk, an arbitration block is placed in parallel to the delay line. Each incoming request for a clock pause is connected to a mutual exclusion (MUTEX) element that decides whether the request is granted or the next clock pulse is permitted. Only if all MUTEX elements agree to grant the rclk request, the clkallowed signal gets set. The Muller-C will withhold the rising of lclk until both inputs have risen. Therefore, the active clock edge gets shifted as long as at least one Ri persists. The MUTEX elements are organized in a parallel fashion thus allowing for easy scaling according to the number of port controllers.
Figure 2: Local clock generation [Home] [About] [People] [Docs] [Links] |