# LOW POWER VLSI DESIGN FOR PROPOSED DUAL ADAPTIVE FILTER

<sup>1</sup>NVDP MURTHY, <sup>2</sup>S. JAGAN MOHAN RAO

<sup>1</sup>Assistant Professor, Dept of ECE, Rama Chandra Engineering College, Vatluru, Andhra Pradesh 534007 <sup>2</sup> Professor & HOD, Dept of ECE, Rama Chandra Engineering College, Vatluru, Andhra Pradesh 534007

ABSTRACT: Adaptive filters are the core functional block in the digital signal processing. Adaptive filter is basically used to reduce the power consumption. In this paper an updated adaptive filter is proposed which is used to calculate the inner product by shifting and accumulating of partial products. These are stored in look up table (LUT). The area is reduced as both convolution operation and correlation operation is performed by using the same LUT. The distributed arithmetic based design uses a look up table sharing for computation of the weight increment terms and filter outputs. It requires less number of LUT access for every output than the existing structure, for higher block size. In the design, number of adders does not increase with filter order. The advantage of the structure is reduction in its delay and area delay product (ADP). The adaptive filter will be multiplier less, which reduces the complexity of the circuit.

KEY WORDS: Adaptive filters, Look up Table (LUT), VLSI, DSP

#### **I.INTRODUCTION**

With the latest improvements in the area of portable digital applications (PDA's) and wireless communication, power analysis and reduction consumption techniques haw: become as significant concern as speed, cost, and reliability in the circuit design level. Power consumption factors, which determine the amount of dissipated energy and heat, have a great influence on critical design issues packaging and cooling such as: requirements, power supply lines and capacity, and the number of circuits that can be integrated in a chip. The power consumption in a digital CMOS circuit consists of dynamic power consumption, static power consumption, and shortcircuits power consumption. The dominant power consumption is normally from the

Dynamic power, which is used in charging node capacitances. As we know that in the areas of system on chip and VLSI designs, the low power Circuit designs is an important issue. As the dimensions of transistors are shrunk into the deep submicron region, the effect of static leakage currents becomes more significant. As the dimensions of transistors are shrunk into the deep sub-micron region, the effect of static leakage Currents becomes more significant. This aspect of power consumption can be Controlled to some extent by novel design, but is predominantly handled by process Engineering. Two areas that have been the focus of active research are asynchronous logic and adiabatic logic

Scaling of transistor geometries have led to integration of millions of devices in a very small space, thus driving realization of complex applications on hardware and supporting high speed applications. This synergy has revolutionized not only electronics, but also industry at large. In order to reduce power, many researchers. designers and engineers have come up with many innovative techniques and have Nevertheless. patented their ideas. designers will need to budget and plan for power dissipation as a factor nearly as important as performance and perhaps more important than area. Low power techniques have been successfully adopted and implemented in designing complex VLSI circuits. As the demand for faster, low cost and reliable products that operate on remote power source performing high end applications keep increasing, there is

always a need for new low power design techniques for VLSI. The authors optimise word-lengths of the input and output data samples and coefficient values. This involves the use of a general search based methodology which is based on statistical precision analysis and the incorporation of cost/performance/power measures into an objective function through word-length parameterisation.

As the dimensions of transistors are shrunk into the deep sub-micron region, the effect of static leakage currents becomes more significant. This aspect of power consumption can be controlled to some extent by novel design, but is predominantly handled by process engineering. Two areas that have been the focus of active research are asynchronous logic and adiabatic logic. When designed carefully, asynchronous circuits can be more power efficient as compared to their synchronous counterparts. In this paper a novel implementation of adiabatic circuits been presented. has Working in asynchronous fashion to get the advantages of both the techniques.

There are several approaches of minimizing power consumption in the filter design area such as changing the instruction set and architecture, and updating filter taps using appropriate adaptive algorithm. In most generalpurpose DSP such as mobile-based and digital communications in order to reach low power filter applications, the first approach is not usable since instruction set cannot be altered; therefore, the second approach is of preferable. Among the two well-known adaptive filtering algorithms: the Recursive Least Square (RLS) and the Least Mean Square (LMS), the LMS algorithm provides powerful а and computationally efficient means of realizing adaptive filters.

## **II. LITERATURE SURVEY**

The different types of adaptive filters with various adaptive filtering algorithms are available in the literature. Also, different types of adders and multipliers are available with advantages and limitation of one over the other. The detail literature survey for the proposed work is as follows: In the paper by R. Ramya et. al, an author presented an area efficient design of adaptive FIR filter using distributed arithmetic (DA). To obtain the updated value of weight and to decrease the mean square error between the actual and desired output, the least-mean-square (LMS) adaptive filtering algorithm was used.

In order to reduce the area difficulty, the adder / subtractor based on weight increment block cells were swapped by carry save adder. It includes smaller LUT, multiplexors, and practically 50 percent less number of adders as compared to the present distributed arithmetic -based design. It uses the adaptive algorithm for the system development. А new architecture based on distributed arithmetic for high throughput implementation of LMS adaptive filter is suggested by Surva Prakash M. et. al. Distributed arithmetic is a bit-serial computational operation which permits digital filters to be put into practiced at high throughput rates, irrespective of the any filter order. However, it caused a complexity while realizing adaptive digital filters which requires recomposing the contents of lookup tables that stored the filter coefficients. The author showed that, the throughput of the proposed design remains unchanged for filter of any order. Further, it was showed that the throughput was approximately equivalent to that of the fixed coefficient performance.

Meanwhile, divided LUT method was used to reduce the required memory units

and pipeline structure were also employed to boost-up the system performance. The FPGA implementation of FIR filters based on conventional method costs extensive hardware resources, which goes against the decrease of circuit scale and the boosting of system speed. It is well known that the FIR filter comprises of adders, delay elements and multipliers. Because of usage of multipliers in the proposed design, it gives rise to two demerits that are (i) Increase in Delay and (ii) Increase in the Area which resulted in low performance (Less speed). A fresh design and implementation of FIR filters using Modified Distributed Arithmetic is provided to address this solve problem. From the simulation result it is indicated that FIR filters designed using Modified Distributed Arithmetic could work stable with high speed and could save almost half the hardware recourses to decrease the circuit scale.

Carry Select Adders (CSLA) are one of the high-speed adders used for data-processing to perform fast arithmetic operations. It is cleared from the structure of the CSLA, that there is scope for reducing the power consumption and area. The work by B. Ram Kumar et. al employed a efficient and simple gate-level alteration to significantly decrease the power and area of the Carry Select Adders.

## **III. RELATED WORK**

Power supply used in portable systems directly depends on energy efficiency of digital logic circuit. Therefore digital systems must have low power consumption. Modern systems should have high processing power for high speed operations. Power in systems depends on accuracy of assigned task and delay. Today the whole digital system is manufactured as a single integrated circuit. Energy usage in electronic systems can be optimized at the chip level or PCB level.

In IC design low power design has become a promising research area. The art of design of low power circuit depends on selection of optimal resources that will reduce the speed of information processing without disturbing system characteristics. CMOS digital device designers have a challenging requirement .They have to optimize low propagation delay and complex functionality along with low power dissipation. The part of the solution is proper choice of operating areas. The world is facing phenomenal growth of demand for energy. One of the promising solutions may be use of energy recovery or adiabatic logic principle. The theme of adiabatic logic is to use ramp clocks to reduce thermal dissipation and recycling of charge from capacitive loads. Voltage and current ramps are used to prevent resistive dissipation in parasitic resistance. Charge stored in gate capacitors is collected back to the power supply. This needs an oscillating power supply network. There are two basic issues or design needs that must be addressed in any CMOS adiabatic circuit. The implementation must result in an energy efficient design of the combined power supply and clock.

Day after day innovative technology which is faster, smaller and more complex yet multifarious than its precursor is being developed. The augment in clock frequency to attain better speed and enhance in number of transistors crowded on top of a chip to accomplish design complexity of a standard structure outcomes amplified power in consumption. That is, each time an intelligent task is executed some data about the information is expelled or lost and it disseminated as warmth.

The term 'adiabatic' comes from 'thermodynamics', which describes a process wherein which no energy exchange with the environment, and hence, no dissipation energy loss takes place. Whereas in semiconductor devices, the transfer of charge between different nodes is the process of energy exchange and different techniques can be utilized so as to minimize this energy loss due to charge transfer.

# **IV. EXISTED SYSTEM**



Fig. 1: Existed system

above figure (1) The shows the architecture of existed system. In this system there are totally 8 input-muxes which provide four functional outputs. Here the multiply and accumulate (MAC) operations are replaced by a series of LUT. Here 64 bit LUT mask is used for the purpose of registers and arithmetic operations. This system is based on the LUT which absorbs more logic each processing element contains LUTs to store the possible values of partial products. The LUTs of each PE are needed to be updated in every iteration. The LUTs are updated with the values of current input samples and the most previous input samples. These values are stored in a corresponding addresses generated from the address generation block. .

The input samples are stored in the LUTs according to the LUT update block. The weights of the filter or an error signal is chosen from the MUX by the select line

which is used as addresses to read the LUT contents. The multiplexers is used to select between the weight vectors or the error signals. The weight vectors and error vectors are used as addresses to read the corresponding values from the LUTs. The new weight vectors are circularly left shifted to read the filter output from the LUTs instead shifting the LUT contents to the right side while updating the LUTs. The weight vectors and error signals to be used as addresses A for reading the partial products to compute filter outputs and adapts the filter weights respectively. The partial products are accumulated and right shifted to produce the final output. The operation involved in this system is bit complex. The main dis-advantage of this system is it ensures area cost.

#### V. PROPOSED SYSTEM



Fig. 2: Proposed system

The above figure (2) shows the architecture of proposed system. In this system there are totally 8 inputs which provide functional outputs. In the proposed system the operation works on two cases they are odd and even LUTs. Input samples are sampled and saved in LUT memory. For every second the value of LUT is updated while the processing is done. After this process the multiplexers will select the error signals obtained in the

system and remaining signals will be passed to the proposed adaptive filter. This filter will process the samples and gives updated samples as output. In this model, we assume the change of the logic value to have a full swing from rail to rail the dynamic power is proportional to the sum of all toggling on all nodes in the chip. It is related to the supply voltage, the chip size, and the circuit activities on chip.

It should be noted that when the supply voltage is very low, the static power consumption might be the dominant part. In a normal application, the supply voltage is fixed from the system condition. The essential low-power technique is then to decrease the scale of the chip and decrease the unneeded circuit activities. Low power circuit technique on basic logic cell level is not practical because it requires the design of a cell library or modification of an existing one. With continuous increase in the number of devices per silicon microchip and remarkable progress in operating speed. By assigning a small value to learning rate, the adaptive process will progress slowly. Most of the past data are then memorized resulting in a more precise filtering action. Compared to existed system, this system gives effective results in terms of power and area.

#### VI. RESULTS



Fig. 3: RTL schematic



Fig. 4: Technology schematic



Fig. 5: Output



Fig. 6: Report

## **VII. CONCLUSION**

In this paper a low power VLSI adaptive filter is proposed. The proposed scheme maintains the same number of clock cycles using multiplexed LUTs which results in smaller critical path. The complexity of the proposed adaptive filter is significantly reduced. The LUT update method shows that in every iteration only one set of LUTs in a processing element are needed to be updated, thus result in saving of power and time. Hence the delay is optimized.

## VIII. REFERENCES

[1] S. Haykin and B. Widrow, Least-Mean-Square Adaptive Filters. Hoboken, NJ: Wiley -Interscience, 2003.

[2] Slock , D.T.M, "On the convergence behavior of the LMS and the normalized LMS algorithms," in IEEE Trans. Signal Process., vol. 41, pp. 2811-2825, Sep. 1993.

[3] Basant K. Mohanty and Pramod Kumar Meher, "A High- Performance Energy-Efficient Architecture for FIR Adaptive Filter Based on New Distributed Arithmetic Formulation of Block LMS Algorithm," IEEE Trans. Signal Process, vol.61, pp.921-932, Feb 2013.

[4] Gregory A. Clark, Sanjit K. Mitra and Sydney R. Parker, "Block Implementation of Adaptive Digital Filters," IEEE Trans. Circuit Syst., vol.28, pp.584-592, Jun 1981.

[5] S. A. White, "Application of Distributed Arithmetic to Digital Signal Processing: A tutorial review," IEEE ASSP Mag., vol.6, pp.4-19, Jul 1989.

[6] D. J. Allred, H. Yoo, V. Krishnan, W. Huang and D. V. Anderson, "A Novel High Performance Distributed Arithmetic Adaptive Filter Implementation on an FPGA," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), vol. 5, pp.161-164. 2004.

[7] S. Baghel and R. Shaik, "FPGA implementation of fast block LMS adaptive filter using distributed arithmetic for highthroughput," in Proc. Int. Conf. Commun. Signal Process. (ICCSP), pp. 443–447. Feb. 10–12, 2011.

[8] M. D. Meyer and D. P. Agrawal, "A modular pipelined implementations of a delayed LMS transversal adaptive filter," in Proc. IEEE Int. Symp. Circuits Syst., New Orleans, LA, May 1990, pp. 1943–1946.

[9] L. D. Van and W. S. Feng, "An efficient architecture for the DLMS adaptive filters and its applications," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 48, no. 4, pp. 359–366, Apr. 2001.

[10] B. K. Mohanty and P. K. Meher, "Delayed block LMS algorithm and concurrent architecture for high-speed implementation of adaptive FIR filters," presented at the IEEE Region 10 TENCON2008 Conf., Hyderabad,India, Nov. 2008.

[11] R. Guo and L. S. DeBrunner, "Two high-performance adaptive filter implementation schemes using distributed arithmetic," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 9, pp. 600–604, Sep. 2011.

[12] S. Baghel and R. Shaik, "Low power and less complex implementation of fast block LMS adaptive filter using distributed arithmetic," in Proc. IEEE Students Technol. Symp., Jan. 14–16, 2011, pp. 214–219.

[13] K. K. Parhi, VLSI Digital Signal Procesing Systems: Design and Implementation. New York: Wiley, 1999. [14] D. P. Das, G. Panda, and S.M. Kuo, "New block filtered-X LMS algorithms for active noise control systems," IEEE Signal Procesd., vol. 1, no. 2, pp. 73–81, Jun. 2007.

[15] C. S. Burrus, "Index mappings for multidimensional formulation of the DFT and convolution," IEEE Trans. Acoust., Speech, Signal Process.,vol. 25, pp. 239– 242, Jun. 1977.