Serving a purpose: Why the ARM architecture is attracting attention in the server market

4 mins read

Power consumption in data centres is becoming an increasingly important issue – no surprise when these centres can house tens of thousands of servers. So there is a push towards the development of processors which offer higher performance with lower power consumption.

This emerging market is attracting the attention of a number of processor developers, including Marvell. "Marvell is targeting data centres that support cloud computing and which provide web services," said Linley Gwennap, principal analyst at The Linley Group. "In those cases, there would be some significant interest in what Marvell is doing." The latest processor family from Marvell, the Armada XP (Extreme Performance), uses the low power ARM architecture to deliver a 1.6GHz quad core variant that has the processing performance needed for the enterprise market. According to Viren Shah, Marvell's senior director of marketing for embedded SoCs, the device family is aimed at networking, network attached storage, laser printers and the server market. "The server market is dominated by the x86 [architecture], but ARM is making forays into that segment – and the reason is mainly its low power," said Shah. Power is the all important metric. "With the quad core design, our goal is to be sub 10W," said Shah. This is a noteworthy figure; according to Gwennap, Intel's Xeon processor consumes around 40W. "Even at that power level, Xeon is not designed as an SoC." In addition to Xeon, a South Bridge chip and Ethernet controllers would also be needed. Marvell has Sheeva, an ARM based core developed after gaining an architectural license when it acquired Intel's XScale business in 2006. Sheeva, which is ARM v6 and v7 instruction set compatible, is a two issue design: either two integer instructions or an integer and floating point are issued each clock cycle. The core also has a limited ability for instruction look ahead, boosting code throughput by reordering the sequence in which instructions are processed. There are five devices in the Armada XP family: two single core; two dual core; and a quad core, the MV78460 (see fig 1). All are pin compatible, but vary in the on chip peripherals, cache size and the width of the memory interface. Each ARM cpu on the MV78460 has a 32kbyte instruction cache and 32kbyte data cache, while the four cores share a 2Mbyte L2 cache. The other Armada XP SoCs have a 1Mbyte L2 cache. The L2 cache is doubled in size in the MV78460 to maintain processing performance. Sheeva cores access external memory through a controller, with the processor supporting DDR3 memory clocked at 800MHz. The device has 40bit physical addressing that supports up to 1Tbyte of dram. Three Armada XP SoCs, including the MV78460, support a 32 or 64bit memory data interface, while the rest have a 32bit bus. The MV78460 includes two serial ATA (SATA), four PCI Express and four Gigabit Ethernet (GbE) media access controllers (MACs). These 10 controllers share 16 6GHz serdes, so the PCI Express controllers could be configured as three x4 ports and one x1 port, while the chip cojuld also support two SATA interfaces and a GbE interface. The Ethernet MAC supports the QSGMII interface such that all four GbE ports can be put onto a single serial link. One design challenge with the MV78460, according to Marvell, was cramming four cores and the I/O peripherals onto an SoC. "We have multiple fast I/O that must coexist in the system," said Erez Alfiya, an application manager at Marvell. "Contention on one affects the whole system performance." To this end, an on chip crossbar switch connects the cores and the L2 cache, as well as the peripherals as they access DDR3 memory. The interface between each core and the L2 cache is 128bit wide and includes a coherency unit, which ensures cache coherency by updating the cache whenever data is written to external memory. The crossbar switch also supports the various on chip blocks. "That is a lot of bandwidth we need to supply to the different I/Os," said Alfiya. As an example, he cites the case of a GbE interface being used alongside two PCI Express ports. "You have traffic coming from the Ethernet port and from the two PCI Express ports. You need to balance the traffic and allow DDR access to the three interfaces," said Alfiya. "We have arbitration between the units because only one unit can access the DDR at any time." Other on chip peripherals include a security engine and support for VoIP via a time division multiplexing (TDM) interface. The security engine can encrypt 2Gbit/s data streams using such algorithms as AES and 3DES. With the TDM interface, the SoC supports up to 32 channels of VoIP. Marvell uses several power saving techniques to limit the MV78460's power consumption to 10W. The device can power down unused cpus and vary the clock frequency dynamically to adapt power consumption to processing load. In sleep mode, the cpus can be turned off while the L2 cache remains powered. In deep sleep mode, the L2 cache is saved in dram before being powered down. The I/O ports then wake the cpus when data arrives. The GbE MACs are Energy Efficient Ethernet compliant (see NE, 25 January 2011) and support DDR3L. Because DDR3L operates at 1.35V, instead of 1.5V, this can reduce power consumption by up to 20%. The device can run one operating system in symmetric multiprocessing mode or asymmetrically. The latter is less common for servers, but features more widely in embedded applications, where the cores can run separate operating systems. "By integrating everything onto one chip, Marvell has designed a single chip quad core server," said Gwennap. This is different to Intel's approach, where two Xeon multicore chips can be put side by side – a so called two socket server configuration. "You cannot do that with the Marvell chip," said Gwennap. "Marvell has boiled the whole server down to a chip; if you want to scale it, you have to add a whole new, separate, server." The Armada XP is currently implemented on TSMC's 40nm G cmos process, although the roadmap includes an eight core design at the 28nm node. The Sheeva cores will be clocked at 3GHz or more, while the SoC will support 10Gbit Ethernet and the PCI Express 3.0 specification. But Marvell isn't the only company looking to bring ARM cores to the server market and ARM is seeding the process with a quad core reference design for the Cortex-A9 architecture, while the basic design for the Cortex-A15 is also quad core (see fig 2). One contender is Calxeda, in which ARM is an investor. It is using a quad core implementation of the Cortex-A9 but, because it is limited to four cores per die, it will probably need to use multiple chips to match the performance of Xeon processors. But the startup is not providing details on the interconnect or the blocks it plans to integrate. "We are going to see a lot of quad core Cortex-A15 designs coming out in a year or so," Gwennap concluded.