GALS @ ETHZ

[Home] [About] [People] [Docs] [Links]


    About GALS

    Introduction to our GALS methodology

    Systems-on-a-chip (SoC) often require a multitude of modules running at different clock frequencies to be integrated on a common die. In strictly synchronous designs synchronizers are used between the clock domains to reduce the possibility of metastability or data corruption, whereas in self-timed designs data transfers at all levels of the system are controlled by handshake signals. The price to pay is a widespread control overhead which often increases die size significantly. Even worse, the well established synchronous design flows can't be used without large modifications and additional asynchronous tools.

    Globally-Asynchronous Locally-Synchronous (GALS) operation is a novel approach to VLSI systems. It employs a self-timed communication scheme on a coarse grained block level and combines the following features:

    • All major modules are designed in accordance to proven industry-standard synchronous clocking disciplines.
    • Data exchange between any two modules strictly follows a full handshake protocol.
    • Each module is allowed to run from its own local clock.
    • Any asynchronous circuitry necessary for coordinating the clock-driven with the self-timed operation is confined to "self-timed wrappers" arranged around each clock domain.

    GALS makes it possible to take advantage of the industry-standard synchronous design methodology within individual clock domains and of self-timed operation across clock boundaries. The self-timed approach does away with the need to time-align the operation of all modules within the framework of a common base clock period. Instead, each module is driven from a local pausable clock generator in its self-timed wrapper being controlled such as to prevent any timing violations from occurring within the locally synchronous island's data interface. The problem of metastability is thus addressed by the ability to pause the local clock when data and the sampling clock edge occur too close to each other. By this method metastability is prevented and not resolved, as with other well known solutions. No extra latency for synchronizers, FIFOs and alike is introduced and the local clock's frequency can be chosen to perfectly fit the needs of the particular module.

    GALS holds the promise for solving or avoiding a variety of problems that are bound to become more important with very deep sub-micron technologies and in view of the virtual component business.

    1. Clock domains are confined to manageable sizes thereby alleviating the clock skew problem. For the same reason, the number of critical paths that must be trimmed at one single moment of time to allow for a target clock frequency is greatly reduced. This feature should prove particularly beneficial when timing critically depends on place-and-route because interconnect delays dominate over gate delays.
    2. Exchanging a module against a faster or a more economic alternative becomes possible without having to redesign the rest of the system. This is because of the self-timed interaction and the mutually independent clocks.
    3. Assembling systems from predesigned modules asks for safe and standard interfaces. GALS addresses the issues of flawless timing and of low-level protocols by imposing a single and well-defined sequence of events at all clock boundaries.
    4. Self-timed operation provides hooks for various low-power circuit techniques.

    Self-timed Wrapper

    The Figure below depicts a block level schematic of a GALS module with its self-timed wrapper surrounding the locally synchronous island (LS island). The wrapper contains an arbitrary number of GALS ports, a local pausable clock generator, and test structures.

    Picture of Self-timed wrapper

    Figure 1: Self-timed wrapper

    The wrappers are built from a library of predefined wrapper elements. This modular approach makes the constructing of GALS circuits safe and relatively easy. We have developed a sufficient library of five partly parameterized wrapper elements, which are described in technology independent VHDL.

    A port controller is responsible for managing all data transfers on a particular port in a GALS system. It gets enabled by the LS Island and has to synchronize data transmission and local clock phases. In order to transmit data fast and efficiently, the port controllers need to act independent from the local clock signal. This is achieved by implementing them as asynchronous finite state machines (AFSM).

    We describe the behavior of the controller using extended burst mode description by Kenneth Y. Yun. One of the major advantages of the extended burst mode description is that it can directly be synthesized into a hazard-free implementation using the 3D synthesis tools for asynchronous control circuitry. 3D synthesis results in a set of equations: one for each output and one for each additional internal state variable (the state is held by a combinational feedback loop). These equations represent a two level And-Or implementation and can therefore easily be transformed into an electrical circuitry.

    To cover the diverse needs for intermodule communication we have defined two families of port controllers:

      Poll-type port:
      It issues requests for clock stretching exclusively to prevent metastability and thus ensures data correctness. The clock is influenced as little as possible. A P-port is appropriate wherever a data transfer is possible but does not necessarily need to happen immediately. The LS island continues to operate normally while the P-port handles the data transfer.

      Demand-type port:
      This type of port also ensures data integrity on the transfer channel but adds a feature similar to clock gating: as soon as it is enabled, it stops the local clock and does not release it until the required transfer has taken place. Therefore, a D-Port is used where a data transfer is immediately needed because the LS island can not carry out any useful computations without the data item being requested. While awaiting the pending exchange, the D-Port suspends the local clock, thereby effectively preventing any dynamic power dissipation. As soon as a new data item becomes available, the LS Island resumes operation directly in phase with the incoming data.

    Local Clock Generator

    The pausable local clock generator shown below is based on a ring oscillator structure with a tunable delay line. To stretch the local clock lclk, an arbitration block is placed in parallel to the delay line. Each incoming request for a clock pause is connected to a mutual exclusion (MUTEX) element that decides whether the request is granted or the next clock pulse is permitted. Only if all MUTEX elements agree to grant the rclk request, the clkallowed signal gets set. The Muller-C will withhold the rising of lclk until both inputs have risen. Therefore, the active clock edge gets shifted as long as at least one Ri persists. The MUTEX elements are organized in a parallel fashion thus allowing for easy scaling according to the number of port controllers.

    Picture of Local clock generation

    Figure 2: Local clock generation



[Home] [About] [People] [Docs] [Links]

These pages by : kgf
25.Apr.2002