Will the attractions of embedded FPGA overcome traditional cost objections and finally see accelerated growth?

4 mins read

Twenty years ago, it looked like a concept that was ready for primetime: putting programmable logic inside ASICs and SoCs. At the time, the move seemed inevitable. ASIC mask prices were rising fast, driven by the need to pull more and more from a bag of optical tricks to keep Moore’s Law on track. The cost of respins alone seemed enough to persuade designers to leave some reprogrammable “sewing kits” in their SoCs to let them iron out bugs after tapeout instead of committing tens of thousands of dollars more to the project to get some new masks.

Despite the apparent attraction as mask costs soared towards $1million and beyond, suppliers of embedded field-programmable gate array (FPGA) cores found it hard to gain any traction. Both start-ups and established FPGA players such as Actel dipped their toe in the water but couldn’t find a customer base.

Patrick Soheili, vice president of business and corporate development at design house eSilicon, says: “The argument of putting custom FPGA fabrics inside ASICs came and went. It was shut down, each and every time. The problem with the approach is you are doing an ASIC because you want the efficiency of the ASIC. By adding an FPGA core that’s maybe not fully used, you are adding overhead.”

Brutal economics

From a pure silicon-cost perspective, the economics for embedded FPGA cores are brutal. Even with hard memory macros included, a ratio of 10 to 20 is a common rule of thumb for the loss of density that comes with the flexibility of programmable routing. But embedded-FPGA specialists have persevered. French start-up Adicsys can trace its roots to M2000, which managed to score some design wins a decade earlier.

Newer start-ups such as Flex Logic and Menta have tried to make the routing overhead less onerous. When it launched its fourth-generation embedded-FPGA core, Menta focused on the most commonly used connection configurations between lookup tables (LUTs) to try to bring the overhead down. Flex Logic CEO Geoff Tate claims relative interconnect density is his start-up’s main differentiator: “Our cofounder came up with a more efficient interconnect and we end up using fewer metal layers and still achieve utilisation of 90 per cent or more.”

Although silicon area remains an issue, markets are emerging that make the cost aspect take a back seat compared to other concerns. Several years ago, in the belief that the time has come for embedded-FPGA cores, Achronix decided to branch out from selling just standalone parts. Although it fabs its own Speedster FPGAs at Intel, the company chose to port its custom core IP to TSMC’s 16nm finFET process to cater for the larger potential customer base that uses the Taiwanese foundry.

Since then the company has changed its focus to present itself as a “data acceleration” company in recognition of the shift in attitudes, according to Steve Mensor, vice president of marketing at Achronix. Instead of concentrating on design teams’ financial costs such as mask changes, the aim is to highlight other costs such as power consumption and the ability to build highly flexible data accelerators that can changes their algorithms at runtime. In the short term, the key market for Achronix is in 5G communications.

Tate has seen the same trend: “Historically, [wireless telecom has] been one of the few high-volume applications for [discrete] FPGAs. But they have problems: they can’t get data in and out of the FPGA fast enough. They also have the issue of the protocols becoming more complicated and being implemented in phases. [With eFPGA] they can tweak things if they make mistakes and still get to market quickly.”

The military sector has helped get Flex Logic off the ground. Tate says a deal with DARPA and Sandia Laboratories to put the cores into a radiation-hardened 180nm process helped fund the porting to finFET processes. “This led to a bunch of designs on GlobalFoundries’ 14nm process,” he adds.

Other markets have sprung up that have taken an interest in using embedded FPGAs. Mensor cites cryptocurrency mining as one that is important to Achronix, at least in the short term. The much bigger market long term may lie in machine learning, a fast-growing sector that has needs similar to those of the 5G basestation makers – and one where standards could take close to a decade to solidify, if ever.

Machine learning

Despite his scepticism over the chances for embedded FPGA in general, Soheili sees machine learning as the one place the technology can score: “I think it’s maybe time for them to show up. You might be able to use a million LUTs or a hundred thousand of them in chunks and use them for functions that you know are going to change.”

One option for machine-learning designs Soheili says is to use package-level integration: “Maybe have it in the form of a chiplet so you don’t change the ASIC.”

Intel’s Programmable Systems Group is pursuing the chiplet path, using the company’s EMIB technology to interconnect chiplets inside a package. Initially, this has been employed to put high-speed transceivers around a standalone FPGA core but the same technology would allow FPGAs and ASICs to sit alongside each other.

Mensor says the need to drive down power consumption and make full use of the high bandwidth of onchip interconnect points to a need for monolithic integration. As well as providing generic LUT-based blocks, both Achronix and Flex Logic have developed arithmetic blocks that are tuned for the kind of processing used in today’s deep-learning pipelines aimed at embedded systems such as self-driving vehicles, speech processors and robots. The chief focus is on 8bit multiply-adds as this bit resolution has quickly emerged as the sweet spot for performing inferencing in machine learning based on deep neural networks. But it’s a fast-moving field with new variants of the deep-learning architecture appearing on a weekly basis.

Achronix offers customers the ability to define their own arithmetic units that can be dropped in alongside the LUTs and readymade DSP blocks and replicated through the array. Tate says Flex Logic expects to add further variants of the arithmetic core for machine learning over time. “We expect that customers who engage will tell us ‘here’s what we really want to do’,” he says.

Mensor says the machine-learning customers at Achronix see their use of embedded FPGA as strategic and not just a way to prototype designs that will ultimately be hardwired: “All the conversations involve CEOs. The use is more integral and architectural with applications that are very heavy in machine-learning functionality.”

Like Mensor, Tate expects acceleration to be a factor in helping to promote the use of embedded FPGAs beyond machine learning. Tate points to the success of the Xilinx Zynq product line in helping to convince microcontroller manufacturers to adopt the technology. “I think most would like to have their own Zynq-like offerings for acceleration functions,” he says.

Security will be one of the targets for FPGA-based acceleration in MCU, Tate reckons. “There will be some hardwired cores they have for encryption. But there are a lot of flavours for encryption, which suits embedded FPGA. The other thing they can use embedded FPGA for is I/O functionality to address things such as non-standard versions of SPI.”

“We are working with the MCU guys to show them they can merge the two: I/O support and acceleration,” Tate says. “And we have other ideas to make FPGA behave more like processors, to make it easier to swap things in and out. We have a tiling approach to the embedded-FPGA design: each of those tiles could run a different piece of code. Then you can treat it more like a multicore processor where you can think of each core running a sub-routine, and created without having to learn RTL.”

The open question is whether the attractions of embedded FPGA in these markets can overcome the cost objections and finally accelerate its growth.