Feeding the beast

5 mins read

In-Vehicle Infotainment (IVI) systems have a huge appetite for data storage capacity with mapping and navigation, music and entertainment occupying a growing memory footprint in cars.

Up until recently, the car maker’s choice of high-capacity data storage technology was the familiar Hard Disk Drive (HDD), but reliability and lifetime concerns have seen it fall out of favour to be replaced in modern designs by the solid-state drive (SSD) or embedded Multimedia Card (eMMC).

Certainly the solid-state solution for data storage is inherently better suited to the requirements of car makers, with their strict processes and qualification criteria for components and modules. Unlike an HDD, an SSD or eMMC contains no moving parts so it cannot suffer mechanical failure and is not vulnerable to damage from shock or vibration.

Nevertheless, the NAND Flash arrays on which the SSD and eMMC are based have inherent characteristics which can cause data corruption or data loss, if they are not properly managed. On its own, therefore, the replacement of an electro-mechanical system with a solid-state system does not guarantee reliable performance.

A preference for MLC NAND

NAND Flash is the basic memory type and is also the storage medium in SSDs and eMMCs. It is available in three main types: Single-Level Cell (SLC), Multi-Level Cell (MLC) and Triple-Level Cell (TLC). The newest version of TLC, 3D TLC, uses a stacked configuration to achieve even higher memory density. The memory density of MLC Flash is lower than TLC but higher than SLC.

Above: End-to-end data path protection products for automotive systems

In automotive SSD and eMMC applications, MLC NAND is preferred because it provides high density and high reliability at a low cost, and with a lower susceptibility to data loss and corruption than TLC NAND.

Data storage capacity of up to 64GB is typically available in MLC NAND-based eMMC products, and up to 512GB in MLC NAND-based SSDs. The use of MLC NAND Flash does, however, pose some risk to data integrity and retention.

The risk of failure or data loss inherent in a NAND Flash array can occur in one of three phases of its life:

  • Infant failure occurs very soon after a new device is fabricated. Inherent variability in the NAND Flash fabrication process makes the production of weak or bad blocks or cells inevitable.
  • During the device’s rated lifetime, there are various potential causes of data loss and corruption, including bit errors in transmission between the host and the NAND Flash array, sudden power-down events, thermal stress affecting data retention, and read disturbance.
  • End of life, NAND Flash products have a cycle life rated in terms of Program/Erase (P/E) cycles, and when this number has been exceeded in any given memory cell, the cell may be expected to fail.

The first two are highly undesirable in automotive systems and in response Silicon Motion’s Ferri range of data storage products has developed technologies and techniques minimising or eliminating the risks of failure and data loss during the rated life of the NAND Flash array.

Features of the Ferri Family products which enhance the data integrity, longevity and performance of SSD boot loaders include:

  • 100% screening of every cell, page and block and comprehensive quality control before shipping, resulting in very low defective parts per million (dppm) rates
  • End-to-end data protection with NANDXtend ECC technology to extend operating lifetime
  • IntelligentScan & DataRefresh, a technology which pre-empts bit loss and prolongs data retention

In addition, Ferri Family products feature NAND Failure Analysis Capabilities that are able to debug any issue that might occur, and to provide in-depth failure analysis report with corrective action plan.

Screening for infant failures

The weak memory blocks in a die – the blocks which are responsible for infant failures – are most likely to fail at the extremes of its specified operating-temperature range. For the Ferri Solutions this range is -40°C to +85°C. By performing a high temperature (85°C) burn-in of every cell, page and block in every NAND Flash die, it’s possible to screen out all devices that contain bad blocks, which are then scrapped.

This policy has the effect of lowering the production yield of Ferri Family devices, but this is seen as a price worth paying for the extremely low dppm that are achieved.

Avoiding data loss

However, even healthy NAND Flash devices are inherently prone to data loss and corruption in normal operation. There are three main ways in which this kind of failure can occur:

  • Exposure to sudden power-off events
  • Data loss in transmission
  • Imperfect data retention under thermal stress

Automotive systems can be subject to sudden power-down events, and the vehicle’s system software is not necessarily designed to trigger a proper Power Down command to an SSD or eMMC. If Sudden Power Off and Recovery (SPOR) processes are not implemented, such an event can cause data loss or a storage system crash. In order to eliminate this risk, Ferri Family products include proprietary firmware in the SSD or eMMC controller which implements a SPOR process, guaranteeing 100% data integrity.

The ability to manage Error Correction Code (ECC) is a normal function of a NAND Flash controller. The purpose of ECC is to correct for bit errors that occur as a stream of data is written to or read from the NAND Flash array. There are various methods for implementing ECC in NAND Flash-based systems, and some methods achieve a higher level of error correction than others.

Automotive manufacturers work to extremely high quality standards, and their aim is to achieve a zero defect rate. Silicon Motion has responded by introducing a stronger error correction capability into its Ferri Solutions products.

Above: LDPC and Page RAID error correction schemes for extended ECC operation

First, it implements end-to-end error correction across the entire data path. This corrects errors not only in Read/Write operations at the NAND Flash array, but also in the buffer memory (an SRAM or DRAM device). Further verification of the validity of a data transmission is achieved through CRC checksum tests at the NAND Flash array, at the buffer memory and at the interface between the Ferri Family device and the system host processor.

Second, Silicon Motion has extended the scope of its data protection to allow for the elevated bit error rates commonly experienced when a NAND Flash block has undergone many program/erase cycles. Conventional BCH or RS techniques for ECC are capable of 100% data correction at low bit-error rates, but as a NAND Flash array ages the bit error rate rises. Conventional consumer SSDs and MMCs leave uncorrected errors that go beyond the capability of the BCH or RS algorithms.

But for automotive applications, the Ferri Family products implement additional error correction. Low Density Parity Check (LDPC) algorithms are applied to recover corrupted words (1kB blocks). Silicon Motion also implements Page RAID algorithms capable of recovering a complete 16kB page that contains corrupt data. Together, these technologies ensure the integrity of Read/Write operations, free of bit errors, across the entire rated cycle life of the NAND Flash array.

Mitigating effects of thermal stress

Data retention is a critical performance parameter for automotive manufacturers: it measures the period over which a bit of data will be retained after being written to a cell. This period is strongly temperature dependent and data retention in the MLC NAND type is markedly shorter than in SLC NAND.

Technology implemented in Ferri Family products protects against data retention failure by intelligently scanning blocks and cells, and refreshing those which are at risk of data loss. This Intelligent Scan & DataRefresh function draws on data about the bit error rate per block derived from ECC operation: at a user-selectable threshold for the bit error rate, a Data

Refresh is performed. At elevated operating temperatures the data retention duration shortens dramatically. Silicon Motion’s Intelligent Scan & DataRefresh function automatically increases the frequency of scanning at higher operating temperatures and can also prevent data loss caused by read disturbance.

The reliability and data integrity of an SSD or eMMC can be greatly enhanced through the application of burn-in, advanced forward error correction and data refresh functions.

Storage solutions need to be well adapted to the needs of the automotive market, providing a combination of long-term reliable operation, data integrity and data retention to ensure that the solid-state memory matches the quality and reliability of any other electronics system in a vehicle.