Advance boosts efficiency of flash storage in data centres

2 mins read

MIT has developed a flash-storage system which its researchers hope will one day replace power-hungry servers in data centres.

Most storage servers today use solid-state drives (SSDs), which use flash storage to handle high-throughput data requests at high speeds. With MIT's new system, LightStore, SSDs are modified to connect directly to a data centre’s network — without needing any other components — and to support computationally simpler and more efficient data-storage operations. Further software and hardware innovations seamlessly integrate the system into existing data centre infrastructure.

In experiments, the researchers found a cluster of four LightStore units, called storage nodes, ran twice as efficiently as traditional storage servers, measured by the power consumption needed to field data requests. The cluster also required less than half the physical space occupied by existing servers.

The researchers broke down energy savings by individual data storage operations, as a way to better capture the system’s full energy savings. In “random writing” data, for instance, which is the most computationally intensive operation in flash memory, LightStore operated nearly eight times more efficiently than traditional servers.

According to the MIT team, a major efficiency issue with today’s data centres is that the architecture hasn’t changed to accommodate flash storage. Years ago, data-storage servers consisted of relatively slow hard disks, along with lots of dynamic random-access memory circuits (DRAM) and central processing units (CPU) that help quickly process all the data pouring in from the app servers.

Today however, hard disks have mostly been replaced with much faster flash drives. “People just plugged flash into where the hard disks used to be, without changing anything else,” Chanwood Chung, first author of the project, said. “If you can just connect flash drives directly to a network, you won’t need these expensive storage servers at all.”

For LightStore, the researchers first modified SSDs to be accessed in terms of “key-value pairs,” a protocol for retrieving data. Basically, user requests appear as keys, like a string of numbers. Keys are sent to a server, which releases the data (value) associated with that key.

The concept is simple, but keys can be extremely large, so computing (searching and inserting) them solely in SSD requires a lot of computation power, which is used up by traditional “flash translation layer.” This fairly complex software runs on a separate module on a flash drive to manage and move around data. The researchers used certain data-structuring techniques to run this flash management software using only a fraction of computing power. In doing so, they offloaded the software entirely onto a tiny circuit in the flash drive that runs far more efficiently.

That offloading frees up separate CPUs already on the drive — which are designed to simplify and more quickly execute computation — to run custom LightStore software. This software uses data-structuring techniques to efficiently process key-value pair requests. Essentially, without changing the architecture, the researchers converted a traditional flash drive into a key-value drive.

The challenge was then ensuring app servers could access data in LightStore nodes. In data centres, apps access data through a variety of structural protocols, such as file systems, databases, and other formats. Traditional storage servers run sophisticated software that provides the app servers access via all of these protocols. But this uses a good amount of computation energy and isn’t suitable to run on LightStore, which relies on limited computational resources.

The researchers designed very computationally light software, called an “adapter,” which translates all user requests from app services into key-value pairs. The adapters use mathematical functions to convert information about the requested data — such as commands from the specific protocols and identification numbers of the app server — into a key. It then sends that key to the appropriate LightStore node, which finds and releases the paired data. Because this software is computationally simpler, it can be installed directly onto app servers.

One final innovation is that adding LightStore nodes to a cluster scales linearly with data throughput — the rate at which data can be processed. Traditionally, people stack SSDs in data centres to tackle higher throughput. But, while data storage capacity may grow, the throughput plateaus after only a few additional drives.

In experiments, the researchers found that four LightStore nodes surpass throughput levels by the same amount of SSDs.