What's involved in 'the most significant step forward in high end FPGAs'

4 mins read

Two years ago, in June 2013, Altera announced its Generation 10 product range – so called because it represents the tenth generation of FPGAs since the company was founded in 1983.

At the time, beyond saying some of the products would be made by Intel on its 14nm Tri-Gate process, all Altera would say was that the parts would feature 'a new architecture'.

Danny Biran, senior vice president of corporate strategy and marketing, said: "As we move from one process node to the next, we get an improvement in density, but not much improvement in performance and power. But, with our next generation, we can break away from this, maybe doubling current performance."

Altera has now 'lifted the lid' on Stratix 10 and the innovations that power it.

The top of the range Stratix 10 device will feature 5.5million logic elements (LE) – the building blocks in Altera devices. Chris Balough, senior director of SoC product marketing, claimed this monolithic part is five times as many as the previous largest device. "This is the most significant step forward on high end FPGAs," he continued. "This level of integration is amplified by Intel's process and packaging technology."

While Stratix 10 boasts a range of features, the HyperFlex architecture is the stand out innovation. Balough explained the need for the new approach. "Over the last 10 or 15 years, there has been a shift in wire delay, rather than gate delay. If you want to achieve GHz frequencies, you have two choices: either widen the buses, so you can move more things in parallel; or address the issue of routing delay.

"Widening buses doesn't work," he contended. "It affects power consumption, needs a larger die and creates more compile problems. So we addressed the routing delay by adding more registers."

And that, in a nutshell, is what HyperFlex (see fig 1) is all about and Balough calls it the biggest FPGA area innovation in a decade. "What's novel is the 'how'," he continued.

According to Altera, the key benefit of the HyperFlex architecture is that 'registers are everywhere'. However, these Hyper-Registers (see inset in fig 1) are not the same as the registers contained within adaptive logic modules (ALMs). A Hyper-Register can be associated with each routing segment in the device, as well as with embedded memory and DSP blocks. However, unlike general registers, Hyper-Registers can be bypassed. This allow optimal register location to be selected automatically after place-and-route in order to maximise core performance.

While it might appear that Hyper-Registers will have a significant area requirement, that's not the case. "Hyper-Registers, which are much smaller than the registers in a logic block, sit in the routing channels," Balough explained. "Even though there are 10 times as many Hyper-Registers than ALMs, they only increase the area by less than 1% and add only 1% to static power consumption."

Use of Hyper-Registers is not obligatory. "Existing designs can be migrated to Stratix 10 and Hyper-Registers ignored," Balough said. "But existing designs can also be retimed by taking advantage of them."

Standard FPGA approaches – retiming, pipelining and optimisation – are also available under HyperFlex and take the 'Hyper' prefix.

Hyper-Retiming is said to eliminate critical paths by moving registers into the interconnect. This balances register-to-register delays and allows the design to run more quickly. An average performance gain of x1.4 is claimed.

Hyper-Pipelining is said to eliminate long routing delays by adding pipeline stages in the interconnect between the ALMs, allowing the design to run more quickly. Hyper-Pipelining, which does not use additional FPGA logic and routing resources, is done after place-and-route and can bring an x1.6 boost in performance.

"This is the most significant step forward on high end FPGAs." Chris Balough

Meanwhile, in designs with long feedback loops and state machines, for example, getting higher performance requires the use of feed-forward or pre-compute paths, rather than combinatorial feedback paths. Here, Hyper-Optimisation – the combination of Hyper-retiming and hyper-pipelining – is applied. Although Altera says this requires more effort, it can double the performance of previous generation FPGAs.

Balough gave examples of the kind of performance boost that might be expected. "Critical paths govern performance," he asserted. "In a conventional architecture, the critical path may be performance limited and have a maximum speed of 286MHz. When Hyper-Registers were put into the routing, performance was boosted to 833MHz." But he accepted there may not always be performance benefits. "There will be parts of a design that can't take advantage of HyperFlex. However, those that can boost the clock speed, for example, will fit into a smaller area. If the clock can be doubled and the design doesn't need extra performance, you could halve the bus width, which means half the FPGA area and no static power penalty."

While the FPGA die will be produced on Intel's 14nm process, other elements of Stratix 10 devices may be manufactured on slightly older nodes. "For the moment," Balough said, "tiles will be made on a 20nm process, allowing some IP reuse. However, that might change in the future."

The system in package approach of Stratix 10 is enabled by Intel's Embedded Multi die Interconnect Bridge technology – or EMIB for short. According to Altera, this provides higher performance, reduced complexity, lower cost and enhanced signal integrity compared to interposer based approaches.

"EMIB does two things," Balough pointed out. "By allowing modularity, we can use the full reticle; it's one of the ways we have got to 5.5m LEs and is an alternative to silicon interposers."

EMIB, a small piece of silicon embedded in the package substrate, provides dedicated interconnect between dice and brings design flexibility to Altera (see fig 2). "We have a range of customers wanting a range of transceiver speeds, interface protocols and modulation schemes," Balough said. "The challenge then comes in our planning process. Because Intel's 14nm process needs about 70 mask steps, being able to modularise the transceiver aspect of the design is a benefit."

EMIB is also likely to allow Altera to add in such 'tiles' as analogue, memory and ASIC, but also dice manufactured on a range of appropriate technologies. One strong possibility is a tile which supports optical communication, although no plans have been made public.

"But it's the software which is the real magic," Balough pointed out. "We have done a significant overhaul of Quartus II, adding a new hierarchical database, a new placer and a new analytical core. This is not only about reducing the compile time, but also about attacking the number of design iterations."

Altera currently is planning to offer Stratix 10 with capacities ranging from 480k to 5.5m LEs. It will also offer GX and SX variants. GX devices are aimed at high bandwidth applications, while SX parts will feature a quad core Cortex-A53 hard processor system running at up to 1.5GHz. "Communications is about half of Altera's business," Balough concluded, "We're bring system integration for more control, more transceivers for more throughput, more efficiency and more functionality in a smaller space."

Initial shipments of Stratix 10 parts are planned for Q4 of 2015.