European researchers are working on a host of energy efficient microservers

6 mins read

European research organisations and companies are busy targeting microservers as a potential very large revenue stream for the medium term. At least four groups, funded to a large extent by the EU's 7th Framework Programme, are looking at a variety of often overlapping processor, system and software architecture projects, all scheduled to finish in September 2016.

Microservers are targeting applications which, individually, may not require very high levels of computing performance, but which may need to be done in large numbers and/or may have critical latency aspects to performance. Importantly, the individual components and subsystem being devised are also expected to have applications in other spheres, notably in the radio access networks (RAN) of mobile infrastructure gear and some high end embedded systems. The make-up of the collaborative projects reflects these potential user cases.

The nine member Euroserver project, for example, links processor IP specialist ARM and chip maker STMicroelectronics with Eurotech, the Italian specialist in systems integration at the board and machine levels, and Greek hardware-software integrator FORTH. On the academic side, the world renowned wireless infrastructure group from Technical University Dresden is working on how databases are handled in embedded communications systems. UK group OnApp, which specialises in Infrastructure as a Service – or IaaS – cloud platforms used mainly by web-hosting providers, is the end-user partner.

One of the key themes is 'easy scalability', Yves Durand, the project co-ordinator from French group CEA-LETI in Grenoble told New Electronics.

"With Intel's processor architecture, which currently dominates the servers and high end computing sectors, you are de facto running at full power. But with the scale out architecture outlined in a forerunner project dubbed Eurocloud, we know well how to best use many smaller processor cores, and how to share memory more flexibly, so as to reduce significantly the power dissipation. I/O virtualisation will also help in this regard through hardware and software support," said Durand.

"In the first stage of the project," he added, "we analysed, tested and revisited some of the throughputs and latencies of the 'chiplet' and memory links that we considered would be needed. Now, we are starting to build prototypes – and Eurotech will have a key role. In essence, the project from now will be to exploit the capabilities of the chip."

The group expects to demonstrate that the proposed architecture can lead to a tenfold improvement in data centre energy efficiency, compared to traditional server and other potential microserver applications. Significant cost benefits and software efficiency are also expected, claimed Durand.

'Chiplet' is the Euroserver nomenclature for the bare processor die being developed by ARM, ST Microelectronics and CEA-LETI. The parts incorporate an eight core ARM Cortex-A53 implemented in ST's 28nm FD-SOI process. In the physical implementation, the chiplet will sit on top of an interposer that carries peripheral circuitry. The whole package relies on advances in 3D silicon integration being readied by researchers at CEA-LETI.

Typically, four chiplets are integrated on a silicon interposer that includes SoC interfaces and on-chip interconnects. The approach is said to lead to very high yields, the best possible power-performance trade-off, and, most importantly, lower costs of application specialisation.

John Goodacre, director of technology and systems at ARM and technical director of the Euroserver project, stresses: "Device acquisition, and thus the ability to spread costs among several variants, is a vital consideration in our approach."

He added the scalable architecture should be completed this year, and a hardware-software prototype implementation is expected 'before the end of this year'. Goodacre told New Electronics that FPGA based prototypes have already yielded important data on the most important characteristics of the full scale prototype. Two versions of this are planned, one targeting embedded servers, the other enterprise servers. The former will hold one or two microserver boards in a small sealed form factor with passive cooling. The enterprise server variant will integrate up to 64 microserver boards together with I/O, storage and power supply into a unit compatible with standard 42U racks (see fig 1).

Elaborating on Durand's comments, Goodacre stressed one early focus of the architecture development was to reduce to the minimum the use of long distance interconnects. Another key innovation in the software architecture is the ability to manage shared resources and processors efficiently and to assign workloads dynamically to the appropriate group of resources, reducing workload interference.

"To achieve all this, the group has had to follow a holistic approach. It is very much a mixture of software and hardware," he noted.

On a more corporate level, Lakshmi Mandayam, ARM's director of software systems, stressed that, whileprojects like this are exploiting emerging technologies, 'they are pointing the way for us into different markets, not just servers, but telecoms infrastructure and networking – all sectors that are growing rapidly'.

There will, of course, be big challenges ahead, Mandayam told New Electronics. "Not least how the industry handles software compatibility issues between, for instance, x86 or Atom based servers and those based on our 64bit v8 architecture in the same chassis."

This is why ARM is working with OEMs and IT groups such as HP, Dell, Citrix, Red Hat and TI under its Server Base System Architecture initiative announced last year, which aims to 'enable portability between ARM-based platforms'.

At the silicon level, Qualcomm became, last November, the sixth company to announce plans to build SoCs and boards based on ARM 64bit cores, joining AMD, Applied Micro, Broadcom, Cavium and Huawei. AMD's Opteron 1100 device started shipping late last year.

Reaching for the sky

Stepping up a level in processing power and targeting both servers and High Performance Computers (HPC), the Mont-Blanc project is almost reaching for the skies, targeting 'a new type of computer architecture built from energy efficient solutions used in embedded and mobile devices'.

Basically, the project is looking for higher performance, less energy consumption and a more scalable architecture.

Co-ordinated at the Barcelona Supercomputing Centre (BSC), the effort started in 2011 and was then extended in 2013 with additional members, until September 2016. Industrial partners in the 14 member consortium include ARM, STMicroelectronics, server maker Bull and UK compiler specialist Allinea. These are joined by European university departments focusing on HPC, as well as other supercomputing centres.

Last year, the group unveiled a prototype blade server that is seen as a stepping stone towards a full Exascale system. The compute cards – which measure 85 x 56mm – integrate a Samsung Exynos 5 SoC, 4Gbyte of DDR3-1600 DRAM, a microSD slot for local storage and a 1Gbit/s Ethernet network interface card. The card offers 6.5GFLOPs of compute for around 10W of power. This works out at about 3.2GFLOPs/W at peak theoretical performance

A single Mont-Blanc blade integrates 15 of these cards and a gigabit Ethernet crossbar switch. Nine of these blades then fit into a standard Bull X chassis. A complete rack hosts up to six chassis, delivering 26TFLOPS at peak performance.

A more recent set up, demonstrated last December at the BSC, saw 135 prototype nodes, with a final 'installation prototype' scheduled to be completed this month, offering more than 1000 computational nodes.

In a recent presentation, consortium executives said the target is now to extend support for ARMv8 64bit processors, further develop the OmpSs parallel programming model so that it can exploit multiple cluster nodes, and finalise the Exascale architecture.

A different approach is being followed by the NanoStreams consortium – a necessity, since the work is targeted at different end users, such as financial institutions and medical diagnostics.

It brings together European expertise in embedded system design and high performance computing software to address the challenge of real time analytics on very fast data streams.

Participants include: Queens University, Belfast, where Professor Dimitrios Nikolopoulos co-ordinates the effort ; computing systems technology companies – Analytical Engines from the UK and Dutch group Associated Compiler Experts (ACE) ; Greek research institute FORTH, and, as end user drivers, IBM (Zurich) and Credit Suisse.

The aim is to devise and implement a microserver based on an application specific heterogeneous analytics on chip (AoC) engine that integrates a small number of latency optimised RISC cores from ARM and a large number of throughput optimised, application specific nanocores. The cores can be programmed in C, as in regular processors, and the aim is to build several more customised to specific user cases.

The system also leverages a hybrid DRAM-PC RAM combo and, according to Prof Nikolopoulos, the use of NVRAM is also being explored. "We have built prototypes of the accelerator chip, based on nanocores, on the Xilinx Zynq platform," he told New Electronics.

"We will be applying a hierarchical scale out approach and expect to deliver micro servers that will have a significant energy efficiency advantages over the traditional x86 based offerings."

He adds that one of the project's main advantages is that it is part driven by the needs of real financial workloads and the architecture and software stacks will be evaluated using real financial data.

The group suggests that, with 20nm technology, NanoStreams will be able to integrate up to 1000 nanocores in reconfigurable logic.

Meanwhile, the Greek GreenCenter project is focusing on the design of efficient hardware accelerators that will augment multiprocessor SoCs (MPSoC) used in data centre servers.

"We have developed a hardware accelerator that is used to accelerate the processing of applications based on the MapReduce framework," Dr Christoforos Kachris, a senior researcher at the University of Thrace, told New Electronics.

MapReduce is a programming model for processing large data sets using high number of nodes.

Teamed with other electronic engineering departments in Greece, the group has developed a MapReduce accelerator that can be augmented to multicore SoCs and which has been proven to accelerate the indexing and processing of key value pairs and thus increase the performance of the system significantly. "The use of the MapReduce accelerator (also) leads to lower energy consumption, since the processing and the indexing of the key value pairs of MapReduce are executed in customised digital logic and not in the processors," noted Dr Kachris.

The design (see fig 2) has been implemented and evaluated in a Xilinx Zynq platform. "The performance evaluation shows the proposed scheme can achieve up to 1.8x overall performance improvement in MapReduce applications."

Asked about the potential commercial impact of the work, Kachris said the accelerator 'could be used efficiently in FPGAs (as an IP core) with embedded processors'. "The proposed scheme could be also incorporated (as hardwired accelerators) to future multicore processors used in data centres, accelerating significantly cloud computing and server applications."

Taken together, the projects are funded to the tune of €40million. Both Goodacre and Prof Nikolopoulos believe that this level of support indicates the high priority the EC gives to research in this promising sector.