The nature of the data centre is changing rapidly. As it changes, so too are the technologies being applied.
Data centre operators have two major concerns. One is handling the vast amount of data flowing into and out of their server farms. The other is doing so in a more environmentally friendly way. The two concerns are, to a certain extent, diametrically opposed: one approach needs computing horsepower; the other needs to minimise the amount of power consumed. One of the reasons why ARM developed its 64bit ARMv8 architecture was to meet these needs; providing the muscle needed to handle data whilst minimising power consumption. Cavium was one of the first companies to announce its intention to build devices based on the ARMv8 architecture and signed an architectural license in 2012. ARMv8 cores, it said at the time, would be used in Project Thunder, in which multicore SoCs would be designed for next generation cloud and data centre applications. Gopal Hegde is general manager of Cavium's data centre processor group, a new business unit within the company. "Cavium has been building processors since its launch about 14 years ago, but these have been aimed mainly at communications infrastructure applications." He said Cavium's technology has been applied in data centres for some time, but mainly as coprocessors. "Thunder will be a mainstream processor," he asserted. So why is Cavium interested in the mainstream and what does the ARMv8 architecture offer? And what have been the challenges in developing SoCs which feature up to 48 ARM cores? "Traditionally," Hegde pointed out, "the data centre was only about compute. It then became compute and memory. But we're now at a point of inflexion where a range of processors is being deployed and it's set to change everything." The company says data centres are currently built using 'one size fits all' processors based on a legacy control plane architecture in which many of the networking, storage, virtualisation and security functions are either implemented in software or with additional chips. This, it claims, brings performance, latency, footprint, power and cost inefficiencies. What is needed, it contends, is an innovative SoC microarchitecture that enables optimised solutions. "The cloud is not just about compute and memory; it's also about I/O. Companies are looking for a high degree of networking flexibility and a high degree of processor scaling," said Hegde, who noted traditional data centre apps were used only by 'thousands'. "Now, apps are being used by tens of millions. Because those early apps were only used by a relative few, they could be done in software. That's not true now; with apps being used by millions, it makes sense to accelerate them." In his opinion, data centre computing is becoming more like data comms – and this is playing to Cavium's strengths. "You need good compute, low power, good bandwidth and specific acceleration technologies. All these map to Cavium's core competences." Project Thunder has developed the innovative SoCs alluded to by Cavium and, after almost two years of development, the devices are getting closer to being released under the ThunderX brand. Two families of ThunderX devices are being launched, both designed for a 28nm process. The CN87xx range, which samples early in 2015, will feature devices with between 8 and 16 cores in a single socket configuration. The CN88xx range, which samples at the end of 2014, will offer parts with between 24 and 48 cores in single and dual socket variants. In the dual socket approach, the two chips can be linked using Cavium's proprietary coherent processor interface (CCPI, see fig 1). Hegde said: "Thunder offers up to 48 64bit cores, each running at up to 2.5GHz. It provides a tremendous amount of compute power." Each core has a 78kbyte instruction cache and a 32kbyte data cache and, because the devices are set associative, the cores can access 16Mbyte of L2 cache. "Compared with the competition's eight core chips and single socket approach," Hegde said, "these are high end processors with lots of I/O, lots of connectivity." So what were the design challenges involved in creating an SoC with up to 48 complex cores? "The level of integration is unbelievable," Hegde admitted, "and this was a big design challenge. We had to make sure we did the integration in a power efficient way. In addition, both ranges feature four families and each device has to be pin compatible so companies can design one board for use in a range of applications. We had to think through how the pins were laid out up front so that we didn't have to redesign the chips." Thunder chips aren't the first multicore devices produced by Cavium; its Octeon III range features parts with up to 48 MIPS cores. "We've taken a lot of IP from Octeon III," Hegde said, "including all the I/O, the coprocessor and memory controllers. A lot of technology has been leveraged." As one of the first companies to adopt the ARMv8 architecture, Cavium has needed to do some development work, something allowed by its architecture license. Hegde explained: "One of the challenges of the ARMv8 architecture is routing interrupts and traffic across more than eight cores. Version 2 of ARM's Generic Interrupt Controller (GIC) supports eight cores; we've worked with ARM to develop GICv3, as well as other enhancements, such as virtualisation. We've broken new ground by developing features that aren't in the standard ARM spec." Hegde said ThunderX chips, which will be manufactured by GlobalFoundries on its 28HP process, have been designed using a mix of custom and commercially available tools. "Mainly, we used standard design tools and a standard flow," he noted. "We have also used standard IP, but have also used IP developed in house, which allows us to differentiate the products." Each ThunderX family features four variants: CP, aimed at compute applications; ST, for storage; NT, for networking; and SC, for security. "The variants are optimised for different workloads," Hegde pointed out. "For example, the CP variant needs lots of cores, a lot of memory and so on, while the ST parts need compute power, but also lots of SATA interfaces and a fabric which allows you to interconnect multiple storage nodes." Similarly, NT parts offer a range of connectivity options, including up to 100G Ethernet, whilst the SC devices feature hardware accelerators for tasks such as deep packet inspection. "With SC parts," Hegde said, "the processor cores can still be used for the workload, but security tasks are offloaded to specific hardware." Looking to meet the increasingly stringent power requirements of the data centre, ThunderX parts will consume as little as 20W, with the power hungriest device drawing 95W. "This is 40% less than a Xeon based CPU," Hegde pointed out, "which could draw 130W. When you include the platform controller hub, that could rise to 160W. We are providing parts with comparable performance, based on traditional benchmarks, but are bringing significant performance per Watt and performance per Dollar advantages," he concluded.