12 July 2010

How multiple cores are improving data processing efficiency

  • How multiple cores are improving data processing efficiency
  • How multiple cores are improving data processing efficiency

Multiple cores on a chip is no longer deemed a novel way to boost computational performance. Chip firms recognise that using multiple cpus in parallel gives a better return than complex architectural tweaks to the cpu itself. By doing so, each elemental core can be tailored in terms of chip area and power consumption, especially when the core's clocking frequency is no longer the overriding concern.

But multicore devices bring challenges – how to best program a multicore architecture and avoid data pinch points between cores and between the cores and external devices.

Cavium Networks has long adopted multicore architectures for its communication processors. Its first generation Octeon Plus features a 16 cpu device. Now, with its Octeon II family, it has hiked the number of MIPS64 v2 risc cores it uses per chip to 32. According to market research group, The Linley Group, this should provide a 400% processing performance increase over the 16 core Octeon Plus.

Octeon II was first detailed a year ago with the CN63xx family. Two additional families have now been announced: the CN67xx, with 8 to 16 cores; and the CN68xx, with 16 to 32. "The CN67xx has two memory controllers, whereas the CN68xx has four," said Venkat Sundaresan, senior product line manager at Cavium Networks. "If you need 40Gbit/s [line rate], you take the CN68xx. If you need 20Gbit/s and you don't want to pay for the extra performance, you go for the CN67xx."

The CN67xx and CN68xx are pin for pin compatible families which target common applications, including cloud computing, packet networks and enterprise.

Security and wide area network (WAN) optimisation are examples of cloud computing tasks. For mirroring applications, where data is stored in more than one data centre, Octeon II can run lossless compression algorithms. Such techniques reduce the amount of data sent and ensure better use of the WAN's capacity.

The devices' packet processing role includes uses within 3G and emerging LTE cellular networks. Here, the CN63xx is used in basestations, whereas the CN68xx would be used by platforms deeper within the network, such as the 3G's Radio Network Controller gateway. Tasks include packet aggregation, applying IP Security (IPSec) encryption and protecting against malicious software.

The processor is also aimed at the 40Gbit/s and 100Gbit/s line cards used in high end switches and routers. Here, the CN68xx would perform Layer 3 and higher packet processing tasks. "Typically, the [CN68xx] processor is tied to an fpga or an asic," said Sundaresan. "The asic does the switching, while all the exception packets, IPSec or any application level processing, is offloaded to this processor."

Besides enhanced MIPS cores, the latest Octeon II families have dedicated hardware blocks, more memory and a variety of new interfaces.
The MIPS64 v2 core is clocked at up to 1.5GHz, has a 32kbyte Level 1 cache and a 4Mbyte Level 2 cache. Cache associativity – how data is stored – has also been enhanced, as has the core's micro architecture in terms of predicting which branch the program code will take and making associated decisions ahead of time.

A new functional block, dubbed the HFA, has been added for deep packet inspection (DPI). Cavium already has devices tailored for DPI and has used its 'know how' to enhance Octeon II. With DPI, not only is packet header information looked at, but footer, source, destination and payload information are also examined, with the latter used to identify applications and protocols. Such information is useful for fixed and mobile operators, in terms of how they manage their networks and for the services that run over them.

A key functional block is the application acceleration manager. The on chip packet processing engine – packet input v2 – processes incoming packets, queues and tags them. The application acceleration manager inspects these processed packets, schedules a task to a free core and determines what occurs to the result.

"The manager acts as a control block, not a data block," said Sundaresan. For example, the manager could set up a direct memory access transfer, assigning to which core the data stream is sent, before the core executes Layer 3 and higher packet processing tasks. "We have a [MIPS] architectural license and we can add specialist instruction sets to accelerate packet processing," said Sundaresan.

The Octeon II family also features an 8Tbit/s crossbar switch. "When you have 32 cores, each running at 1.5GHz, there is a lot of I/O traffic. Traditional bus architectures will not handle that," said Sundaresan. "We have a [on chip] switched network that enables the traffic to run in a non blocking way."

To tackle device power consumption, Cavium switches off unused parts of the cores. It also employs dynamic power management, whereby the chip's voltage is altered based on processing load. In this way, higher voltages are used only in critical parts of the chip. "If the applications load is low, it [the chip] throttles down the speed," said Sundaresan.

The chip's interfaces have also been enhanced in terms of the I/O performance and protocols supported. The device includes the high speed PCI Express Gen 2, which can be used to link to another Octeon II processor or to an asic. It also has 10 gigabit Ethernet (GbE) ports and Interlaken interfaces that can either link two CN68xx processors, or the processor to a ternary content addressable memory. Interface options include up to eight Interlaken lanes at 6.25Gbit/s each; up to five 10GbE ports, up to two PCIe Gen2 controllers, and up to 12 GbE ports.

The resulting processing and interface architecture means the CN68xx is capable of 40Gbit/s duplex line rate processing, can compress data at 20Gbit/s and perform DPI at 15Gbit/s.

According to the Linley Group, Cavium's chip will deliver several times the peak instructions per second of NetLogic's forthcoming XLP832, although the NetLogic device will offer similar memory and I/O bandwidth and should have better single thread performance. Cavium's Octeon designs have also proven more power efficient than RMI's (now NetLogic), claims The Linley Group. It expects the same to be true with the CN68xx, even though, at 65nm cmos, it lags the XLP832 by one process node.

As for software, the Octeon families share a compiler. "Anything that runs on the two core device should run on the eight core and on a 32 core," said Sundaresan. "The compiler and software development kit abstract away the number of cores."

Both Octeon II families will sample in the final quarter of 2010. Sundaresan is confident Cavium will meet the deadline, despite the six core CN63xx having slipped two quarters. "The complication going from Octeon Plus to Octeon II was primarily the new core," said Sundaresan. "We introduced [on the CN63xx] a lot of new high speed interfaces – PCI Express Gen 2, DDR3 [memory interface], compression and new blocks; these are the same as found on the CN68xx."

Adding more cores to boost packet processing performance will continue. "We have ensured that the packet and control planes can scale with the number of cores," said Sundaresan. "Adding more cores is still possible."

Author
Roy Rubenstein

Supporting Information

Downloads
26285\P30-31.pdf

Websites
http://www.caviumnetworks.com

Companies
Cavium Networks Inc

This material is protected by Findlay Media copyright
See Terms and Conditions.
One-off usage is permitted but bulk copying is not.
For multiple copies contact the sales team.

Do you have any comments about this article?

Add your comments

Name
 
Email
 
Comments
 

Your comments/feedback may be edited prior to publishing. Not all entries will be published.
Please view our Terms and Conditions before leaving a comment.

 

Related Articles

802.11ac WLA test solution

National Instruments has announced early access support for testing next ...

Breakthrough

Microsemi has unveiled the world's first SyncE timing devices to support ...

Optical fpga demonstrated

Altera has been demonstrating a prototype fpga with optical interfaces to ...

High speed transport

Merchant chips that address 100Gbit Ethernet and optical transport are finally ...

Clock speed / parallelisation

Baseband processors for mobile devices typically run at as slow a clock speed ...

The ROADM ahead

Even if you have nothing to do with the telecoms industry, it is hard not to ...

Securing Smart Grid Devices Using ...

Energy providers and governments worldwide are looking for ways to upgrade ...

MontaVista Linux Carrier Grade Edition

Communications networks are very different from other kinds of computing ...

Extensive High-End Remote Manageability ...

Kontron Embedded Motherboards support Intel Active Management Technology. Read ...

Embedded World: ADLINK

ADLINK Technology will present amongst other products, the following highlights ...

Transceivers achieve 20Mbit/s

Linear Technology has introduced the LTC2862-2865, a family of rugged, high ...

Solid State Supplies Dev kit

Solid State Supplies has announced a development kit for the XBee family of ...

embedded world 2012

Taking place from 28 February to 1 March 2012.

SimpleLink Product Family

Introduction to SimpleLink Wireless Connectivity Solutions.

MWC: Stereoscopic 3D Gaming

See the custom and popular games that run on TI's Blaze mobile development ...

TI’s Media Event at MWC 2011

TI Transforms Mobile: Greg Delagi, SVP of TI's wireless business unit addressed ...

TI, National Semi takeover

It's been a while since there has been a takeover on the scale of that ...

Mobile phones, 25 years on

It's 25 years since the first call was made on the UK's mobile phone network on ...

John Schwartz, Digi Int'l

Graham Pitcher finds out from a communications specialist that M2M is slowly ...

Maria Marced, President, TSMC

Innovation, technology and the right people. Graham Pitcher finds out why ...

Dave Bell, president, Intersil

Intersil's president updates Graham Pitcher on the company's progress in ...