13 March 2012

Architecture optimisation as a tool for low cost low power solutions

The need for low cost and low power consumption drives semiconductor companies and their customers towards devices made on advanced processes.

But it is possible to reduce cost, size and power consumption through architecture size optimisation. These non expensive optimisation methods are often overlooked and this is a missed opportunity for further cost and power reduction.

Cost and power reduction
According to IBS, the average cost of developing a new chip rose from $45million at 65nm to $150m at 22nm. This higher investment eventually pays for itself by lowering chip cost. The main reason that semiconductor companies developing chips on advanced process node is the promise of cost and power consumption reductions.

In general, device cost decreases in the more advanced smaller process nodes, although it takes time until the process node matures and achieves better yield performance than previous process node. For this reason, it takes time for the minimum cost point to move the last two state of the art advanced process nodes.

In fig 1, Fp is the process cost optimisation factor and, in this example, is the process cost optimisation factor between a device made on a 65nm process (point A) and the same part made on a 45nm process (point B) without architecture optimisation.

Meanwhile, Fa is the architecture cost optimisation factor. In fig 1, this the cost of a device made on a 45nm process (point B) without architecture optimisation and with architecture optimisation (point D). The best way of minimising cost is to switch to a 45nm process and employ architecture optimisation.

Figure 2 shows power consumption optimisation. The minimum power consumption is achieved through moving from a 32nm process to the 22nm node (points A and B) and through architecture optimisation at 22nm (point D).

In many wireless and video algorithms, application special characteristics can be exploited for architectural optimisation. For example, it is common to have finite impulse response (FIR) filter based algorithms and applications with symmetrical coefficients (fig 3).

Symmetrical FIR filters have an even number of taps and each two coefficients at the same distance from the centre have the same value. It is therefore possible to do one multiplication with the sum of the two samples related to the same two coefficients in a specific time. This halves the number of multipliers and the related logic required for the FIR filter implementation. While FIR filters with odd symmetry have an odd number of taps, each two coefficients still have the same distance from the centre and the same value; only the centre coefficient doesn't have another tap with the same value. The number of multipliers required to implement an odd symmetry FIR filter can be reduced by 2n/(n+1), where n is the number of taps. This reduction factor closes on 2 as the number of taps grows.

In some applications, symmetric FIR filters have an additional characteristic that can be exploited. In half band symmetric filters, each second coefficient, except the the centre coefficient, is zero. Since these filters are also odd symmetric FIR filters, it is possible to reduce the number of required multipliers by up to a factor of four.

Operations with 2d symmetry are very common in video applications. As described in fig 4, this 5x5 2d FIR filter's vertical and horizontal symmetry means each circle with same colour represents a coefficient with the same value. It is possible to exploit 2d symmetry so the same multiplier can be used for all pixels with the same coefficient value. As described in fig 4, there are cases of 4, 2 and 1 pixels with the same coefficients. Therefore, the potential exists for the size of a generic matrix to be reduced by up to four times.

Complex multiplication
Significant portions of wireless and communications applications are based on algorithms featuring complex numbers, such as I and Q channels and symbol modulation.

Complex multiplication between complex sample (a + jb) and complex coefficient (C + jD) could be implemented with four multiplications:

(a + jb)*(C + jD) = (a*C – b*D) + j(a*D + b*C)

With some algebraic manipulation, the same result can be obtained using three multiplications.
= a*C – b*D + (a*D – a*D) + j(a*D + b*C + (a*C – a*C))
= a*(C+D) – (a+b)*D + j(a*(C+D) + (b-a)*C)

Since the silicon area required to implement a multiplier is significant larger than that required to implement adders and subtractors, this allows for size reduction.

Double data rate
Double data rate throughput optimisation is a special architectural technique designed to overcome fabric throughput issues common to fpgas. While the dsp slices in an fpga have similar size efficiency as those in an asic made on a similar process node, they have a flexibility advantage. The problem is that, in many cases where fpga utilisation is high, the fabric's relatively lower operation frequency creates a throughput bottleneck.

The LatticeECP4 fpga has innovative throughput boosting interfaces embedded into the dsp slices. This enables the part to offer double the throughput of other fpgas. Using this feature means the LatticeECP4 can implement complex dsp functions using half the number of multipliers that would be required by other fpgas. This optimisation enables a
significant system cost and size reduction, as well as decreased power consumption.

In many cases, different architectural optimisation techniques can be combined to achieve higher levels of cost and power reduction. Most FIR filters implemented in wireless or video applications are symmetric.
Implementing a 64 tap symmetric FIR filter with an input data rate of 245.76Msample/s in a typical fpga would require 64 18 x 18 multipliers. The LatticeECP4 can implement the same FIR filter using 16 18 x 18 multipliers, approximately four times smaller. This provides other benefits, including an approximate halving of power consumption and the opportunity to fit the design into a smaller fpga, which reduces cost.
Similarly, half band filters and double data rate optimisation could be implemented in other digital up and digital down converter interpolation or decimation filters. In these cases, the cost and power savings are even higher – approximately eight and four times respectively.

Architecture optimisation is relatively inexpensive way to reduced for silicon device cost and power consumption. There is no reason why semiconductor companies or their customers should not take advantage of those 'low hanging fruits' instead of investing in developing devices for manufacture on expensive advanced process nodes.

In many cases, a combination of different methods of architecture optimisation – such as data flow optimisation, algorithm characteristics optimisation or/and algebraic optimisation – can result in cost being halved and power consumption being reduced by a factor of eight.

Asher Hazanchuk is product planning manager with Lattice Semiconductor.

Asher Hazanchuk

Supporting Information



Lattice Semiconductor UK Ltd

This material is protected by Findlay Media copyright
See Terms and Conditions.
One-off usage is permitted but bulk copying is not.
For multiple copies contact the sales team.

Do you have any comments about this article?

Add your comments


Your comments/feedback may be edited prior to publishing. Not all entries will be published.
Please view our Terms and Conditions before leaving a comment.

Related Articles

Amp works at 50% efficiency

Researchers from the Universities of Bristol and Cardiff have created an ...

Materials breakthrough

A technique to study the interface between materials, developed at the National ...

Quantum logic gate created

Professor Gerhard Rempe, director of the Max Planck Institute of Quantum ...

Down to the wire

Once the plain old telephone service, the role of the telephone wire continues ...

Within touching distance

Graphene is starting to filter onto the market. HEAD claims its tennis racquets ...

Making light work of photonics

Today's world is permeated by electronics, from industry to communications, ...

NI Trend Watch 2014

This report from National Instruments summarises the latest trends in the ...

Capactive sensing

This whitepaper looks at a number of capacitive sensing applications to ...

Altium's Innovation Station

An introduction to the Altium Innovation Station. It includes an overview of ...

IBM tackles 22nm challenges

IBM has announced the semiconductor industry’s first computationally based ...

BEEAs 2013

9th October 2014, 8 Northumberland, London

Self-destructing electronics

Researchers at Iowa State University have created transient electronics that ...

MEMS switch for 'true 4G'

General Electric has created a 3GHz RF MEMS switch that can handle up to 5kW of ...

Smart fabrics developed at NPL

NPL has developed a new method to produce conductive textiles. The technique ...

Electronic charge to 800mph

Breaking the land speed record would require a very special blend of latest ...

Flash drives semi technologies

Demand for NAND flash is said to be growing at 45% per year, driven mainly by ...

Top tech trends for 2013

Bee Thakore, European technical marketing manager for element14, gives an ...

Nathan Hill, director, NGI

Research into graphene won Andre Geim and Kostya Novoselov the Nobel prize in ...

Brent Hudson, Sagentia

Sagentia's ceo tells Graham Pitcher how the consulting company is anticipating ...

Prof Donal Bradley, Imperial

Graham Pitcher talks to a researcher who was 'there at the start' of the ...