Xilinx unveils heterogeneous multiprocessing architecture

4 mins read

Historically, developers of programmable logic devices – and, more latterly, FPGAs – have taken advantage of process shrinks, not only to integrate more features into their parts, but also to produce parts which use every available bit of space on the reticule.

Until the 28nm node, this model worked well. But the performance requirements of leading edge applications have derailed this linear progression. Now, FPGA developers are looking to build upwards, as well as outwards, with the appearance of quasi 3D techniques. Yet, like other leading semiconductor developers, Xilinx realises the days of the process shrink bringing the next generation of FPGA are gone. EMEA marketing director Giles Peckham said: "With the move to 20nm and 16nm, we're not getting the cost per transistor benefit we used to, so there is no advantage in process shrinks." But that isn't stopping Xilinx from heading towards the 16nm node. But even taking a complex architecture to the next node may not be enough for some applications. Steve Glaser, senior vice president for marketing and corporate strategy, said: "We are having to invent the next generation. We can't take the same architecture and expect Moore's Law to bring the benefits; we have to look at new approaches." That new approach is the UltraScale MPSoC architecture. "It's a $900million investment," Glaser pointed out, "which has required a corporate transformation. It's been about competence, building new teams and importing more mixed signal expertise." According to Xilinx the MPSoC architecture builds on the success of its Zynq-7000 All Programmable SoC. It extends the UltraScale FPGA and 3D IC architectures to enable heterogeneous multiprocessing with 'the right engines for the right tasks'. "Generic engines are not optimised," Glaser contended. "Some applications will need special engines." Xilinx has categorised the targets for its MPSoC approach as 'smarter systems' and says these systems require increasing amounts of communications and computing bandwidth. 'There is increased use of vision and location data, a greater need for guaranteed quality of service, increased security services and other resources. 'Big Data' (and small data) applications need more and more analytics to automate control processing, provisioning, configuration and overall system management' it notes. Right tools for the job Xilinx says it has identified five processing bottlenecks – DSP, graphics, network processing, real time processing and general computing performance – and has concluded these tasks cannot be addressed using a single processing architecture. "Many have tried to accommodate all of these processing tasks with a 'one size fits all' approach," said Glaser, "and all who have tried this approach have failed." So Xilinx has turned to a heterogeneous multiprocessing approach in which tuned programmable processing engines are used for specific tasks. "It's the only certain path to success for complex system designs," Glaser continued. "Using the right engines for the right tasks provides the performance, power and cost benefits that smarter systems require to be efficient and effective." In fact, Xilinx claims the use of specialised engines will bring at least a fivefold increase in performance, compared to the use of 'generalised' engines. Devices based on the UltraScale MPSoC architecture will offer 'unprecedented processing, I/O and memory bandwidth', in Xilinx' opinion through a mix of heterogeneous processing engines. These soft and hard engines will be embedded in a high performance, on chip interconnect, supported by on chip memory subsystems. Using heterogeneous processing and programmable engines optimised for different applications will enable UltraScale MPSoC devices to deliver the performance and efficiency required by these smarter systems while retaining backwards compatibility with Zynq-7000 SoCs. "But scaling multiple engines for multiple tasks needs a new architecture," Glaser added. Where Zynq-7000 devices – based on the UltraScale architecture – featured a dual core ARM Cortex-A9 based application processor unit, the UltraScale MPSoC architecture will bring in an as yet unnamed ARM processor to enable 64bit applications. Bigger and better CPUs and heterogeneous processing engines must be able to extend their enhanced processing power throughout the entire chip and beyond, says the company. The UltraScale MPSoC architecture is said to provide the required peripheral set, on chip interconnect with 'massive' bandwidth and the ability to address terabytes of data. Bearing in mind the existing architecture features a Cortex-A9, it's a reasonable assumption to suggest the MPSoC variant will offer at least a Cortex-A53 core. But there are a number of options, including dual and quad core implementations, the potential use of the A57 core and adopting one of the possible big.LITTLE combinations. The UltraScale architecture featured the ARM cores as part of a homogeneous arrangement. But the MPSoC implementation will use ARM cores in a heterogeneous fashion. And this arrangement appears to extend to the processing engines, where at least some of the five 'flavours' alluded to will be implemented in hardware; the remainder – potentially the DSP and packet processing engines – created as soft processors within the FPGA fabric. While Xilinx hasn't given any indication about when products based on the MPSoC architecture might appear, it has given some broad hints about what those products might look like. It lays out five target sectors for MPSoC devices: smarter data centres; smarter vision; smarter networks; smarter energy; and smarter factories. Its initial presentation highlights devices for smarter controls, smarter vision and smarter networks on the road map, appearing in that order. The smarter controls device will integrate graphics processing, real time control, enhanced security and the next generation ARM CPU. This will be followed by a device for smarter vision applications, which will add a video processing engine. The last of the three products identified on the road map will address smarter networks. In this device, the video and graphics processing engines will be replaced by packet processing and waveform processing engines. What is known is that MPSoC devices are targeted at TSMC's 16nm finFET process, which Xilinx says will offer a 60% improvement in performance per Watt compared to devices made on TSMC's 28nm process. Creating MPSoC devices has required a rework of Xilinx' Vivado Design Suite. "It's the umbrella for the MPSoC architecture," Glaser commented. Increasing complexity means multiple abstraction layers are needed and these are provided not only through Vivado, but also through SystemC, OpenCL, Matlab and LabVIEW. Together, Xilinx says these create the most comprehensive mixed abstraction system design environment. Included are tailored specification environments for C, C++ and OpenCL, along with system level design capabilities. Backwards compatibility with existing software targeted at Zynq-7000 devices has been included, but Xilinx says it will be providing an expanding ecosystem of software, middleware, OS support, debuggers, IP tools, boards and design services. With potential applications spanning manufacturing, data centres and wireless communications, Xilinx has designed in multilevel security and safety features. Included are enhancements to anti tamper features, as well as trust assurance safety and reliability in order to meet key industry standards. "We've built security, safety and power management features into the MPSoC architecture," Glaser concluded, "but we have made it simple to use."