Synchronous digital circuits implemented by standard synthesis tools are known to produce large current spikes due to simultaneous switching of registers, activated by the global clock. Voltage fluctuation on supply lines is of particular importance in high-precision mixed systems, where the need for low noise comes together with large data words, typically 32 bits or more. On the other hand, the processing speed is usually low, dictated by physical limitations of analog circuits and sensors. Many precision systems and data logging systems based on chemical or micro-electromechanical (MEMS) sensors have output data rates below 150 Hz, since the measurement results are highly filtered with the decimation process. As a consequence, the ratio of the peak supply current over the average value may be very large, indicating the feasibility of trading the circuit speed for the mitigation of switching noise. Synthesis tools put most effort into the optimization of speed, power, and area. When it comes to the switching noise minimization, they do not provide much support, and we have to apply additional design measures.
In the first section we present the background and overview of known methods for switching noise reduction in digital circuits. With the assumption of low processing speed and low-power operation, we limit further discussion to the CMOS technology. In given circumstances, the serial clock distribution method is accepted as the most suitable to implement large switching noise reduction factors. The main problems, identified as power loss due to redundant switching, increased number of the registers, and potential timing violations are identified and left to be solved in the continuation of the work. A detailed description of serial clock distribution and suggested solutions for presented problems are shown in Section 2. In terms of power consumption, it is not economical to place delay buffers in front of every register cell. Considering that all clock buffers in the serial clock tree drive equal loads and that the loads are small, we propose to replace the clock delay buffers by register-internal clock buffers. The standard, minimum clock skew tree is therefore eliminated, enabling a large power saving (up to 30% power consumption of the circuit), in combination with the peak supply current reduction. If the clock period is not the limiting factor, then the timing solution can be assured by the application of shadowed registers. The data processing time is reduced to half of the clock period, while the other half remains to be used for serial clock distribution. Since shadowed registers consume more power and area than standard registers, we tend to replace as many shadowed registers with standard registers as possible, without compromising the timing constraints. Our aim is therefore to minimize the number of shadowed registers, provided that the clock is serially distributed in one or more branches without timing violations. In the third section we describe the optimization of the clock signal distribution based on the minimal use of shadowed registers. First, we introduce the general synchronous timing model for further mathematical processing. The given optimization problem is translated into the bandwidth reduction of the upper triangular part of the circuit adjacency matrix. This approach is much faster than other methods relying on iterative static timing analysis (STA). Still, heuristic approaches are required due to the algorithm complexity. In the continuation we investigate the bandwidth reduction in three directions. In the first approach we apply the minimum out-degree reordering, which turns out to be applicable only to small circuits with up to ~50 nodes. Better results, at the expense of longer computation times, are obtained with the genetic algorithm (GA) used in the second approach. Finally, in the third approach we upgrade the genetic algorithm to implement the serial clock distribution in combination with the gated clock synthesis. All presented algorithms are supplemented with necessary steps to accept timing relaxations given by the known multi-cycle. The section ends with the discussion of compatibility issues related to the serial clock distribution in the design-for-test (DFT) environment. The integration of the presented methods with standard design tools is presented in Section four. This important step puts to work various analyzers for circuit timing and functionality verification. Nonstandard cells are therefore characterized and included in the standard library, using common descriptions, such as Verilog HDL model, NLDM time model and LEF. With the help of standard design tools, we also create different data matrices containing parameters for the proposed optimization algorithms. In the fifth section, we present comparative results of supply current spike simulations in different synthesis cases. Our optimization methods have been verified on several standard test circuits from the ISCAS89 and ISCAS99 family. Detailed circuit analysis with intermediate results of one selected circuit is presented for illustration.
|