Already, Microsoft had been experimenting with using Altera's FPGAs alongside Intel processors as a way to boost the performance of its Bing search engine. Perhaps Intel saw the potential benefits and moved quickly.
But FPGAs will be just one of the technologies being deployed in the next few years to speed communications processing. One company looking to take advantage of this growing need is EZchip, which acquired many core semiconductor specialist Tilera for $130million in July 2014.
Director of marketing Bob Doud noted the current thrust in the comms world is to accelerate things in hardware, rather than in software. "It's not so easy to get FPGAs to accelerate traffic management functions," he said.
And it is in applications such as this where EZchip expects to have success with its recently announced Tile-Mx chip. The device, which will feature 100 Cortex-A53 cores when it appears some time in 2016, will be targeted at high performance network applications and at network function virtualisation (NFV). Those 100 ARM cores will give the Tile-Mx a processing performance of 200Gbit/s.
Tilera is no stranger to many core devices, having launched its first such part – the Tile 64 – in 2007. The company name derives from its architectural approach of replicating a number of 'tiles' across a chip. Each tile featured a general purpose processor, cache, and a non blocking router. "We have always had a tile architecture, interconnected with a mesh," said Doud. In those days, Tilera used a MIPS based processor, but the Tile-Mx is a different beast – each of its 25 tiles houses four Cortex-A53 cores, cache memory, a router and address mapper, the Tile Core Accelerator and the coherence control and directory block (see fig 1).
"We have included new things on the Mx because we're using ARM cores, rather than the proprietary ones used previously. We've had to make some 'tweaks'," Doud noted.
The ARM cores are interconnected using Tilera's proprietary SkyMesh network. "ARM knows how to give you a package of four cores which can work well with each other," Doud said. "We enable the four cores on each tile to be coherent with everything else and to share an L2 cache memory."
Each SkyMesh bus (see fig 2) – whose links Tilera describes as east, west, north and south – consists of several hundred interconnects, with the number of interconnect dependent on device size; Tilera will also launch 36 and 64 core variants. The 100 core device will have 40Mbyte of on chip memory, split between three levels of cache.
SkyMesh is the fifth generation of Tilera's network on chip technology. The fabric is said to provide high bandwidth coherent connectivity amongst all elements on chip. With the ability to handle an aggregate bandwidth of up to 25Tbit/s, SkyMesh is said to lower application latency and the demand on DRAM bandwidth.
The development of the processor from the Tile 64 to the Mx is said by Doud to have been a 'classical progression'. "It's what engineers learn from previous generations," he claimed. "The Tile 64 in 2007 got us into the market, but we quickly spun the Tile Pro. That had a lot of interest and we brought a lot of things together with the third generation device; the Tile-Gx.
"The jump from the Pro to the Gx was big; it was a major architectural upgrade as we moved from 32bit to 64bit cores, added acceleration and cryptographic functions and moved from a 90nm to a 40nm process.
"The Mx represents a similar jump. We're moving to ARM cores, working out how to fit them into our mesh network and incorporating new I/O technology."
Tilera and its parent EZchip are playing in the same markets, but addressing different aspects. "There's some overlap," Doud said, "but our various products are differentiated on performance. If you're looking to do control and data plane design, for example, you should go with a multicore device. If you are focusing on the data plane, then a network processor is more appropriate." See fig 3 for an example application.
Tilera can now access EZchip's IP and has applied some network acceleration technology in the Mx. "We're taking advantage of the IP," Doud continued. "Before, our emphasis was on doing things in software. However, as you move up the performance curve, you start to look for help from acceleration technology; for example, to perform traffic management." This functionality is provided in the form of Tile Core Accelerators (TCA).
Doud claims the Mx will be the highest performing multicore processor available when it launches in 2016 and believes it should find application in high end data centres and high end enterprises.
Highlighting the fact that it has Intel in its sights, Tilera has included support for DPDK – the data plane development kit that enables the acceleration of apps which run on the x86 architecture. "It's a set of APIs which allow packet processing and networking functions to run on Linux. It works by bypassing the kernel and doing everything in user space, then sending data onwards."
According to Doud, DPDK isn't exclusive to Intel. "We can use it in two ways. One is as a network front end; if a Tile-Mx is used as a preprocessor, it needs to deliver packets to an Intel processor in DPDK format. However, if someone has developed an app on an Intel part and wants to migrate to an Mx, having the APIs makes it simpler."
Doud says the main target for the Mx is 'hyperscale data centres', where the device can be used for software defined networking and NFV applications. "This kind of processor might be needed in every server," Doud explained. "If an Mx is plugged into a server, it might do the work of four chips, so operators will get space and power savings. It can also be used for so called 'top of rack' applications, where it could perform the kind of things which a Cisco box would have done in the past."
One reason why the Mx provides a high level of performance is the way it handles processing loads, said Doud. "What you don't want to do is to deliver 100G to the first core." This aspect is handled by MPIPE, the
multicore programmable intelligent packet engine. "It can take packet header apart and create 'flow affinity'. With 100 cores, each core is only handling 1G."
The Mx uses a run to completion model, in which all the cores perform all tasks. "It's the way code is written for Intel processors," Doud explained. "But that doesn't mean we're not pipelining some tasks."
The Tile-Mx will be made by TSMC on a 28nm process. "Will the Mx be the next 'killer' processor?," Doud asked rhetorically. "It's not what we've architected it to be.
"We anticipate it performing such tasks as NFV – a domain traditionally handled by Intel processors. Running those apps on a Tilera device will be more effective in terms of throughput and power efficiency."