Many core processor range set to cut data centre power consumption

4 min read

The burgeoning popularity of cloud computing, bolstered by that of social networking, is posing challenges for data centre operators. The reason is, of course, the phenomenal increase in the levels of data traffic and the similar increase in the processing load.

Figures provided by US company AT&T suggest the amount of data being routed around communications networks increased by a factor of 30 from the end of 2007 to the end of 2010. Similarly, Cisco Systems has predicted the amount of wireless data traffic will quadruple between now and 2015, by which time some 20Exabyte (20x10^18) of data will be flowing per month. Much of this data is associated with uploads to YouTube, Facebook and similar sites. For instance, 35 hours of video are uploaded every minute to YouTube and more than 30billion pieces of content are shared each month on Facebook. And these applications are hosted 'in the cloud' – at huge data centres placed strategically around the world. The data centres house thousands of servers, each of which consumes large amounts of power and generates significant amounts of heat. Removing the heat requires air conditioning, which consumes almost as much power as that used by the servers. Understandably, data centre operators are looking to cut their costs by increasing the ratio of computing performance to the amount of power supplied. The solution traditionally adopted by data centre operators has been the multicore processor, where devices such as Intel's Xeon processor offer four, six and eight cores. But there is a move beyond that to the so called 'many core' processor. One company looking to exploit this move is Tilera, which recently unveiled the TILE-Gx 3000 series. This comprises devices with 36, 64 and 100 cores. Ihab Bishara, Tilera's director of server solutions, said the processors are not aimed at every server application. "These devices will be targeted at applications that run on thousands of servers. This represents a big chunk of the market and we believe our technology can bring an order of magnitude improvement in the performance per Watt ratio." The challenge, said Bishara, is how to put so many cores on the same chip. "We have experience of two product generations," he noted, "and are now working on our fourth generation product." He believes Tilera is 'ahead of the competition'. Tilera's technology has its roots in MIT, where the idea of a 32 core cache coherent multiprocessor was first developed in 1994. "The first system was built in 2002 to demonstrate the technology and Tilera was founded in 2004. The first commercial device, a 64 core chip, shipped in 2007," said Bishara. New families have since arrived every two years. Originally, Tilera was focused on the networking and multimedia sectors, developing processors for use in routers and video streaming. "Two years ago," Bishara continued, "we started an initiative in the server market and have worked with the top server users to develop this new family." Bishara pointed to two important elements of the TILE-Gx range. "The first is its power efficient core. This consumes less than 0.5W, but performs as well as an Atom or a Cortex-A9. But the most important element which differentiates TILE-Gx devices is the ability to put so many cores on the same chip. Anyone can put cores on a chip," he continued, "but to get the degree of scalability that we have achieved requires our 'secret sauce'." Bishara claimed a scalability of 90%, referring to the ability of the device's performance to increase linearly as more cores are added. "Other technologies may only get 20%," Bishara claimed. The TilePro 64, introduced in 2009, is currently being used in a range of data centres. "The Gx 3000 series is based on what we've learned from TilePro, plus our collaboration with data centre owners," said Bishara. At the heart of the TILE-Gx 3000 series is a 64bit very long instruction word core with a 40bit address space. This, says the company, suits the device to cloud computing. Each core runs at up to 1.5GHz and consumes less than 0.5W. So the top of the range Gx-3100, with 100 cores, is said to consume 48W. Each core has a three way pipeline, with up to three instructions per cycle and features 32kbyte instruction and data caches, as well as a 256kbyte L2 cache. While TilePro was a 32bit device, TILE-Gx is a 64bit processor. "This is important," Bishara asserted. "Most customers want 64bit processors and, by offering this, we are probably two years ahead of ARM." The 64bit capability means Gx parts can address more than 4Gbyte of physical memory. "Many new applications assume 64bit," Bishara continued, "and some won't boot on a 32bit processor because the image is too large." Each core on the TILE-Gx chip is connected to all others using a mesh network, which Tilera terms the iMesh. "This is one of the biggest innovations," Bishara asserted. "There are five networks on the chip; some handling memory accesses; others handling I/O and cache operations. The iMesh network enables many of the bottlenecks found in other approaches to be eliminated, including cores sending data to cache while trying to receive packets from Ethernet MACs." The architecture is constructed from tiles, a combination of a processor core and a non blocking, cut through switch. Each switch connects into the iMesh, which provides each tile with more than 1Tbit/s of interconnect bandwidth, creating a more efficient distributed architecture and eliminating data congestion. Multiple parallel meshes are used in order to separate different transaction types and provide more deterministic interconnect throughput. The scalability claimed for the TILE-Gx range is partly enabled by a scalable caching system. Tilera's Dynamic Distributed Cache technology provides a fully coherent shared cache system across an arbitrarily sized array of tiles. Instead of using large, power hungry centralised L2 or L3 caches, all L2 caches can be coherently shared amongst other tiles, distributing the load, effectively acting as a large L3 cache. Bishara said Gx chips can be used to directly replace Intel SandyBridge devices. "The Gx-3036 will replace an eight core SandyBridge," said Bishara, "while the Gx-3064 can replace a dual eight core SandyBridge. We expect the 100 core chip to be used in high end clusters." Bishara pointed out that the Gx-3036 consumes 20W. "This can replace a chip which consumes more than 120W. It's a huge power saving, and that's just from the chip; it doesn't take into account the other factors. When you add them in, you get an x10 improvement in power efficiency," he noted. Infrastructure cost is a major part of cloud computing investment. "It's a big issue," Bishara concluded. "Companies are having to build new data centres to host servers. But, instead of more data centres, they can increase the performance of existing facilities by using new technology. That is what Tilera is providing."