ISSN: 2249-6645

### **Gated Driver Tree Based Low Power Delay Buffer Architecture**

<sup>1</sup>Manjusha. P, <sup>2</sup>Gopinath. B <sup>1</sup>PG Scholar, <sup>2</sup>Associate Professor

<sup>1, 2</sup> SNS College of Technology, Coimbatore-35.

Abstract: This paper presents circuit design of a lowpower delay buffer. The proposed delay buffer uses several new techniques to reduce its power consumption. Since delay buffers are accessed sequentially, it adopts a ring-counter addressing scheme. In the ring counter, double-edge-triggered (DET) flip-flops are utilized to reduce the operating frequency by half and the C-element gated-clock strategy is proposed. A novel gated-clockdriver tree is then applied to further reduce the activity along the clock distribution network. Moreover, the gateddriver-tree idea is also employed in the input and output ports of the memory block to decrease their loading, thus saving even more power. Both simulation results and experimental results show great improvement in power consumption

## *Index Terms:* C-element, delay buffer, first-in-first-out, gated-clock, ring counter.

#### I. INTRODUCTION

Portable multimedia and communication devices have experienced explosive growth recently. Longer battery life is one of the crucial factors in the wide spread success of these products. As such, low-power circuit design for multimedia and wireless communication applications has become very important. In many such products, delay buffers (line buffers, delay lines) make up a significant portion of their circuits. Such serial access memory is needed in temporary storage of signals that are being processed, e.g., delay of one line of video signals, delay of signals within a fast Fourier transform (FFT) architectures and delay of signals in a delay correlator. Currently, most circuits adopt static random access memory (SRAM) plus some control/addressing logic to implement delay buffers. For smaller-length delay buffers, shift register can be used instead. The former approach is convenient since SRAM compilers are readily available and they are optimized to generate memory modules with low power consumption and high operation speed with a compact cell size. The latter approach is also convenient since shift register can be easily synthesized, though it may consume much power due to unnecessary data movement. Previously, a simplified and thus lower-power sequential addressing scheme for SRAM application in delay buffers is proposed. A ring counter is used to point to the target words. Since the ring counter is made up of an array of D-type flip-flops (DFFs) triggered by a global clock signal. In this paper, we propose to use doubleedge-triggered (DET) flip-flops instead of traditional DFFs in

the ring counter to halve the operating clock frequency. A novel approach using the C-elements instead of the R-S flipflops in the control logic for generating the clock-gating signals is adopted to avoid increasing the loading of the global clock signal. In addition to gating the clock signal going to the DET flip-flops in the ring counter, we also proposed to gate the drivers in the clock tree. The technique will greatly decrease the loading on distribution network of the clock signal for the ring counter and thus the overall power consumption. The same technique is applied to the input driver and output driver of the memory part in the delay buffer. In a delay buffer based on the SRAM cell array such as the one in, the read/write circuitry is through the bit lines that work as data buses. In the proposed new delay buffer, we use a tree hierarchy for the read/write circuitry of the memory module. For the write circuitry, in each level of the driver tree, only one driver along the path leading to the addressed memory word is activated. Similarly, a tree of multiplexers and gated drivers comprise the read circuitry for the proposed delay buffer. Simulation results show the effectiveness of the above techniques in power reduction.

#### **II. CONVENTIONAL DELAY BUFFERS**

The simplest way to implement a delay buffer is to use shift registers as shown in Fig. 1. If the buffer length is and the word-length is, then a total of DFFs are required, and it can be quite large if a standard cell for DFF is used. In addition, this approach can consume huge amount of power since on the average binary signals make transitions in every clock cycle. As a result, this implementation is usually used in short delay buffers, where area and power are of less concern.

SRAM-based delay buffers are more popular in long delay buffers because of the compact SRAM cell size and small total area. Also, the power consumption is much less than shift registers because only two words are accessed in each clock cycle: one for write-in and the other for read-out. A binary counter can be used for address generation since the memory words are accessed sequentially. The SRAM-based delay buffers do away with many data transitions, there still can be considerable power consumption. in the SRAM address decoder and the read/write circuits. In fact, since the memory words are accessed sequentially, we can use a ring counter with only one rotating active cell to point to the words for write-in and read-out. This method, known as the pointerbased scheme. This type flip-flops is initialized with only one "1" (the active cell) and all the other DFFs are kept at "0." When a clock edge triggers the DFFs, this "1" signal is propagated forward. Consequently, the traditional binary

address decoder can be replaced by this "unary-coded" ring counter. Compared to the shift register delay buffers, this approach propagates only one "1" in the ring counter instead of propagating -bit words. Obviously, with much less data transitions, the pointer-based delay buffers can save a lot of power.



Fig.1.Delay buffer implemented by using shift register



Fig.2.Pointer based delay buffer



Fig.3.ring counter with R-S flip flop,

By observing the fact that only one of the DFFs in the ring counter is activated, the gated-clock technique has then been proposed to be applied to the DFFs. In their approach, every eight DFFs in the ring counter are grouped into one block. Then, a "gate" signal is computed for each block to gate the frequently toggled clock signal when the block can be inactive so that unnecessary power wasted in clock signal transitions is saved.

#### **III. PROPOSED DELAY BUFFER**

In the proposed delay buffer, several power reduction techniques are adopted. Mainly, these circuit techniques are designed with a view to decreasing the loading on high fan-out nets, e.g., clock and read/write ports.

#### A. Gated-clock ring counter

Although some power is indeed saved by gating the clock signal in inactive blocks, the extra R–S flip-flops still serve as loading of the clock signal and demand more than necessary clock power. We propose to replace the R–S flip

flop by a C-element and to use tree-structured clock drivers with gating so as to greatly reduce the loading on active clock drivers. Additionally, DET flip-flops are used to reduce the clock rate to half and thus also reduce the power consumption on the clock signal. The proposed ring counter with hierarchical clock is shown in figure. Each block contains one C-element to control the delivery of the local clock signal "CLK to the DET flip-flops, and only the "CKE signals along the path passing the global clock source to the local clock signal are active. The "gate" signal (CKE ) can also be derived from the output of the DET flip-flops in the ring counter.

The C-element is an essential element in asynchronous circuits for handshaking. The logic of the C element is given by



$$C^+ = AB + AC + BC$$

where A as well B are its two inputs and C+ as well as C are the next and current outputs. If A=B, then the next output will be the same as . Otherwise ,A#B and C+ remain unchanged. Since the output of C-element can only be changed when A=B, it can avoid the possibility of glitches, a crucial property for a clock gating signal. In order to reduce more power, we replace DFFs by double-edge-triggered flip-flops and operate the ring counter at half speed . With such changes, the clock gating control mechanism is different. When the input of the last DET flip-flop in the previous block changes to "1" making both two inputs of the C-element the same, the clock signal in the current block will be turned on. When the output of the first DET flip-flop in the current block is asserted, then both inputs of the C-element in the previous block go to "0" and the clock for the previous block is disabled. In order to further diminish the loading on the global clock signal ("CLK"), we propose to use a driver tree

Vol.2, Issue.3, May-June 2012 pp-1395-1398

distribution network for the global clock and activate only those drivers.



Fig.5 ring counter with clock gated by C-elements,

#### **B.** Advantages

- 1. For low power operations Special read/write circuitry, such as a sense amplifier, is needed.
- 2. The logarithmical decrease in loading can dramatically reduce the power consumption
- 3. .the memory module of a delay buffer is often in the form of an SRAM array with input/output data bus inorder to reduce area

#### IV. ANALYSIS AND DESIGN OF GATED DRIVER TREE USING D FLIP FLOP

#### A. Simulation methodology

The MICROWIND program allows the student to design and simulate an integrated circuit. The package itself contains a library of common logic and analog ICs to view and simulate. MICROWIND includes all the commands for a mask editor as well as new original tools never gathered before in a single module. Circuit Simulation can be achieved by pressing one single key. The electric extraction of circuit is automatically performed and the analog simulator produces voltage and current curves immediately.

DSCH is digital schematic editor and simulator. The DSCH software, which is a user-friendly schematic editor and a logic simulator presented in a companion manual, is used to generate Verilog description. The cell is created in compliance with the environment, design rules and fabrication specifications. The Logic Cell Compiler is a particularly sophisticated tool enabling the automatic design of a CMOS circuit corresponding to logic description in VERILOG. A set of CMOS processes ranging from 1.2µm down to state-of-theart 0.25µm are proposed.

**B.Simulation result** 

The simulation results are given below.



Fig.6. gated driver tree using D flip flop.

It shows the schematic is drawn in DSCH. Then the verilog module of this is generated The verilog file is compiled in microwind to generate the layout of the gated driver tree this is shown in the figure 8.



Fig.7.Timing analysis of gated driver tree, data is shifted using D flip flop.



Fig.8.Layout of gated driver tree using d flip flop

Vol.2, Issue.3, May-June 2012 pp-1395-1398





After simulating this layout simulation result is appeared in a new window. This is the voltage Vs time graph. This will shows the input and output responses. This also shows the amount of power consumption.

The power consumed by this gated driver tree can be measured by simulating this layout.the simulation result is shown in figure 9 shows the consumed power is 11.09mw.

# V. ANALYSIS OF GATED DRIVER TREE USING D FLIP FLOP AND DET FLIP FLOP.

TABLE I Power Consumption of Three Ring Counters

| Ring Counter Structure<br>(N=1024, M=128, D=8) | Simulated Power<br>@ 1.8V, 50 MHz,<br>0.18µm | Estimated Loading<br>Ratio by Eqs.<br>(3)-(5) |
|------------------------------------------------|----------------------------------------------|-----------------------------------------------|
| Traditional Ring Counter                       | 2127 μW                                      | 2048                                          |
| Gated Clock Ringer Counter [6]                 | 433 µW                                       | 400                                           |
| The Proposed Ring Counter                      | 20 µW                                        | 21                                            |

The loading of the clock signal in the proposed scheme can be analyzed as follows. Assume that a quad tree is used for clock drivers, then for a length-N ring counter constituted by a total of flip-flops N partitioned in M blocks.

#### VI. CONCLUSION

ISSN: 2249-6645

In this paper, we presented a low-power delay buffer architecture which adopts several novel techniques to reduce power consumption. The ring counter with clock gated by the C-elements can effectively eliminate the excessive data transition without increasing loading on the global clock signal.

The gated-driver tree technique used for the clock distribution networks can eliminate the power wasted on drivers that need not be activated. Another gated-demultiplexer tree and a gated-multiplexer tree are used for the input and output driving circuitry to decrease the loading of the input and output data bus. All gating signals are easily generated by a C-element taking inputs from some DET flip-flop outputs of the ring counter.

Further simulations also demonstrate its advantages in nanometer CMOS technology.With more experienced layout techniques the cell size of the proposed delay buffer can be further reduced, making it very useful in all kinds of multimedia/communication signal processing ICs.

#### REFERENCES

- Eberle.W et al 2001 80-Mb/s QPSK and 72-Mb/s 64-QAM flexible and scalable digital OFDM transceiver ASICs for wireless local area networks in the 5-GHz band IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 1829–1838.
- Hosain.R, L. D. Wronshi, and albicki.A, 1994. Low power design using double edge triggered flip-flop," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 2, no. 2, pp. 261–265.
- Li.W and Wanhammar.L, 1999 A pipeline FFT processor in Proc. Workshop Signal Process. Syst. Design Implement pp. 654–662.
- Liou.M.L, P. H. Lin, C. J. Jan, S. C. Lin, and T. D. Chiueh, 2006. Design of an OFDM baseband receiver with space diversity IEEE Proc. Commun., vol. 153, no. 6, pp. 894–900.
- Pastuszak.G, 2005 A high-performance architecture for embedded block coding in JPEG 2000," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 9, pp. 1182– 1191
- Shibata.N, Watanabe.M, and Tanabe.Y, 2002 A currentsensed high-speed and low-power first-in-first-out memory using a wordline/bitline-swapped dual-port SRAM cell," IEEE J. Solid-State circuits, vol. 37, no. 6, pp. 735–750.
- 7) Sutherland.E 1989 Micropipelines Commun. ACM, vol. 32, no. 6, pp.720–738
- Tsern.E.K and Meng.T.H 1996 A low-power video-rate pyramid VQ decoder IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1789–1794.