As the server market changes, can ARM challenge the incumbents?

4 mins read

A confluence of events is enabling the ARM microprocessor core to gain a foothold in the server market; until now, the preserve of AMD and Intel.

Data centre operators want to reduce the power consumption and cost of servers, especially the larger ones, such as Google, Yahoo and Facebook, where tasks are not split across a traditional server's two or four processors, but across hundreds or thousands of processors (see NE, 9 August 2011). These trends have led to an opening for ARM, said Andrew Feldman, ceo of server start up SeaMicro. "It is also the part of the market that is growing like crazy." That explains the entrance of companies such as Marvell, Calxeda and AppliedMicro with ARM based server chips. "In the compute world, everyone has always been beaten from below," said Feldman. "ARM has looked at the history of small beating big and sees this as a rare opportunity. AMD and Intel are fully aware of this trend. "The server market is extraordinarily important to them," he continued. "They see this coming; this is the new battleground." Marvell is shipping the Armada XP, while Calxeda has announced EnergyCore, which is being used by HP. AppliedMicro is developing the first ARM 64bit architecture, the X-Gene. All three designs address different server cost-performance points. Marvell's Armada XP processor family is aimed at cloud computing. The flagship device uses four 32bit ARM cores, clocked at 1.6GHz. Also on chip are four gigabit Ethernet interfaces, four PCI-E 2.0 ports and two SATA 2.0 storage ports (see NE, 22 Feb 2011). Calxeda's EnergyCore and AppliedMicro's X-Gene, meanwhile, are aimed at cluster server designs. Larger data centres use multiple server racks, linked using expensive networking. A server cluster condenses multiple processors and associated interconnect into one box. "Instead of a customer cobbling together 1U and 2U [rack unit] servers with a couple of Ethernet ports and a top of rack switch, you will see that entire infrastructure condensed within a single box – a 2U or 4U chassis, with the cluster being a platform," said Karl Freund, Calxeda's vp of marketing. SeaMicro's SM10000 server is an example cluster architecture which it claims consumes a quarter of the power and takes a sixth of the space of traditional servers. Its SM10000-64HD combines up to 384 64bit Intel Atom N570 dual core processors. SeaMicro adopted the Atom for its lower power, but has just launched a SM10000 product that uses Intel's Xeon. "Our approach to date is with an x86 processor, but our architecture was designed to support any processor," said Feldman. Calxeda's EnergyCore is a 32bit ARM Cortex-A9 quad core processor, clocked between 1.1 and 1.4GHz. "You have to go to an A15 (the 64bit Cortex architecture) to get more than four cores [per chip], and an A15 was not sufficiently mature during our design cycle," said Freund. The chip implements a node, running one operating system. It has six ports – five gigabit Ethernet interfaces that can operate at 1 or 10Gbit/s, and a separate 1Gbit Ethernet interface. "If someone needs just a 1Gbit Ethernet connection, they can turn off all the 10GbE connectors," said Freund. The Ethernet ports are typically used for internode communications. The first EnergyCore also includes a memory controller, an 80Tbit switch fabric and a management processor. All traffic – between cores, between cores and I/O and to external networking – goes through the switch fabric. "You go out through the switch to talk to SATA, PCI-Express, external Ethernet and to an adjacent node," said Freund. The management controller oversees booting the quad cores and interfaces to the server management software. It also optimises routing over the switch fabric; for example, if congestion occurs in one part of a fabric or if a link should fail in the cluster. The controller is implemented using a simpler ARM core, clocked at a lower frequency. The chip consumes some 5W and 500mW when idle. AppliedMicro has an architectural licence from ARM, enabling its designers to develop its own 64bit cpu. The cpu is compliant with ARM's v8 architecture and, by developing its own design, the company aims to be first to market with a 64bit ARM. The result is a node design of up to 128 ARM cores, spread over four ics clocked at up to 3GHz. Only the X-Gene processor architecture has been detailed, specific device members have yet to be announced. The processor includes the ARM cores, a Level 3 cache, a non blocking terabyte fabric, memory controllers, 1 and 10 Gbit/s interfaces, storage ports and PCI Express lanes. "Each chip can connect to three companion chips to create a cache-coherent system," said Jim Johnston, product marketing director at AppliedMicro. This 'socket' looks, from a software perspective, like a 128 core processor. "This implies that a 32 core chip is in the roadmap," said Johnston. The cpu core is superscalar – it has four ALUs such that, on any given clock cycle, it can execute, out of order, up to four instructions. In a four chip meshed server node, each chip is linked to the other three. Since each is 100Gbit/s, each chip has a 300Gbit/s dedicated interchip interface capacity. The device also features another cpu on chip for system management and offload. Dubbed multi-Slim, this comprises four simpler 32bit ARM cores and can perform a variety of tasks, including power management or a secure boot. The multi-Slim provides flexible power management because, for web server applications, the device can be idle for up to 80% of the time, said Johnston. In power down mode, the chip consumes less than 300mW, with all cpus powered off, except multi-Slim. As for products, Calxeda is working with HP for the server vendor's Redstone design. HP uses a 72 node fabric in a butterfly fat tree configuration, with the fabric then connected to a top of rack switch using a 10Gbit/s Ethernet link. Each HP system has four trays, each with 18 slots and each slot holds a four node EnergyCard. Local storage cards can also be added. Freund says the architecture allows up to 4096 nodes per tree. HP's system will be available in the second quarter of 2012, with volume production of the chips in the second half of the year. Meanwhile, Calxeda expects to announce six more customers in the coming months. The device will be generally available in the second half of the year. Calxeda says it will follow ARM's own roadmap. "Clearly for us, our focus is to get to 64bit," says Freund. AppliedMicro has yet to detailed design wins, but says that first silicon will be implemented in a 40nm cmos process, followed quickly by 28nm. Its design has yet to be taped out, but first X-Gene devices will be available by year end. The company has a Linux development card that implements a dual core X-Gene design with the full fabric. Developers can use the card to verify a design and for software development. The system card will be available to early customers this quarter.