07 June 2010
A switch in focus: Communications hardware
Data centres, the modern day cathedrals of IT, are in the midst of radical change.
Ethernet is taking over the data centre, carrying storage and high performance computing traffic that has, until now, required dedicated networking equipment. Data centre managers also want to simplify how their Ethernet switches are linked, moving from a tiered design that aggregates traffic to a single layer network that is simpler to manage and has lower latency. The wiring in data centres is also transitioning to optical as the speed-distance product of copper interfaces runs out of steam.
These trends are reflected in the features of the latest data centre ics from Broadcom and Chelsio Communications.
Broadcom has announced its latest StrataXGS Ethernet switch range, dubbed BCM56840. Implemented in 40nm cmos, the three announced families in the range support data rates of up to 640Gbit/s.
Meanwhile, Chelsio Communications has announced its T4 Terminator asic. The T4 will be used on motherboards and on interface adaptor cards in servers and storage equipment that link to high speed Ethernet switches; the same platforms Broadcom is targeting with the BCM56840.
"You can envisage the T4 as a network interface controller paired with a switch," said Kianoosh Naghshineh, ceo of Chelsio.
According to Broadcom, the BCM56840 was a significant design undertaking. "This chip wasn't an evolution," said John Mui, senior product line manager for network switching products. "It more than doubles switch capacity, has more than 2billion transistors and includes memory, logic and mixed signal."
The BCM56840 series features data centre bridging (DCB) – the Ethernet protocol enhancement that enables lossless transmission of storage traffic. The series also supports the Fibre Channel over Ethernet (FCoE) protocol and the devices' 10Gbit Ethernet ports can be combined to form 40Gbit Ethernet ports.
Data centres use a hierarchy – typically three layers – of networked switches: access; aggregation; and core. While Broadcom's latest switch ics are aimed at access switches, the devices are also suited to mid tier aggregation.
An access switch connects servers in a data centre and typically sits on top of the server rack, which explains why it is also known as a top of rack (TOR) switch. Servers are now moving to 10Gbit/s interfaces, says Mui, with a TOR switch typically connecting up to 40 servers.
The three BCM56840 series ics share a common feature set, but differ in their switching capacities – 320, 480 and 640Gbit/s. The largest device, the BCM56845, has 64 x 10Gbit/s ports and will be used to link 40 10Gbit/s based servers to an aggregation switch, with the link to the latter using four high capacity 40Gbit/s lines.
On the aggregation switch, a BCM56840 series ic would sit on a line card, says Mui. Each aggregation switch is likely to have 6 to 12 40Gbit/s ports per line card and between eight and 16 cards per chassis. In turn, two chassis are used for redundancy. "You can foresee one line card connecting up to 12 TORs," said Mui. This will allow lots of servers – thousands and tens of thousands – to be linked in the data centre.
Each port has an integrated 10.3Gbit/s serdes. The serdes, which enables interfacing to SFP+ modules, also supports the 10GBaseKR backplane interface, but an extra ic is needed for 10GBaseT. Combining four serdes enables the interfacing to a qsfp module supporting the various 40GbE optical and copper standards.
Besides DCB Ethernet, the switches also support Layer 3 packet processing and routing. There is also a multistage content aware engine that allows higher layer, more complex packet inspection (layer 4 to 7 of the OSI model) and policy management. The content aware functional block can also be used for packet cut through, a technique that reduces switch latency by inspecting header information and forwarding while the packet's payload is arriving. Mui says the switch's latency is less than 1µs.
Broadcom's ics address the data centre trend to a flatter switching architecture by supporting the emerging Transparent Interconnection of Lots of Links (TRILL) protocol being developed by the Internet Engineering Task Force. Until now, a spanning tree technique has been used to avoid the creation of loops within an Ethernet network. However, the spanning tree becomes unwieldy as the network grows and works only at the expense of halving the available networking bandwidth. TRILL is designed to allow much larger Ethernet networks while using all available bandwidth.
In turn, Broadcom has its own protocol – called HiGig – that adds tags to packets. Using HiGig, a very large logical switch, made of multiple interconnected switches, can be created and managed.
Chelsio's T4 asic, which sits on a server or storage platform, is based on a data flow VLIW architecture. The device supports iSCSI storage and iWARP clustering protocols and, by adding internal ternary content addressable memory (TCAM) and internal memory, the T4 stands out as the first 10GbE controller to fully offload all types of storage and cluster traffic, according to the Linley Group.
The T4 also supports FCoE and virtualisation, while doubling the port capacity to four 10GbE interfaces. The four 10GbE ports also support link aggregation to create 20Gbit/s – even 40Gbit/s – pipes. For the server interface, PCI Express v2.0 ×8 is used, delivering 32Gbit/s.
Virtualisation allows multiple operating systems and applications to share the same server. Virtualisation allows better use to be made of a server's processing capacity, which typically had a 10% utilisation. The T4 supports single root I/O virtualisation, allowing up to 128 virtual machines running on the server to share the single PCI interface. The device also includes an embedded switch – the 'virtualisation engine' – that switches the server's virtual machines between physical ports. Because the device is programmable, various switching approaches, proprietary and emerging standards, are supported.
The use of VLIW enabled the T4's designers to trade parallel processing with clock speed and, hence, power consumption. By using more than 1000bit wide instruction words, overall power consumption is relatively low compared to multicore ic designs. "The T4 is specified as 1Gbit/s full duplex for every 16MHz of clock," said Naghshineh.
Given the speed of the interfaces, the line rate processing is demanding. "When you go to 10, 40 and even 100Gbit/s, things that used to be done in software become very expensive, so you need some amount of hardware," said Naghshineh. Yet the architecture must be programmable, given the protocols that need to be supported and which continue to evolve. "Very little of this chip is hardwired; it is unsafe to hardwire," he said.
Broadcom and Chelsio point to higher capacity designs with faster interconnects as an important part of their roadmaps, while Naghshineh points out that version 2.0s are coming for all the various protocols.
First samples of Chelsio's T4 are on schedule for this quarter and will be in production before year end. Chelsio will also launch its own adaptor card in the second half of 2010. Broadcom's ics, available as samples for several months, will also be in production by year end.