If you were to draw up a list of properties for the ideal non-volatile memory, you probably would not start with those associated with flash. Flash is power-hungry, particularly when it comes to writes; it wears out; it's hard to integrate with logic circuitry; and you cannot erase and rewrite a single bit at a time.
Yet, despite the apparent drawbacks, flash has become the most successful type of semiconductor memory of all time. It has surpassed DRAM in terms of production volume and density and continues to be the non-volatile memory technology that few would actively bet against. Despite the promises made for alternatives – such as magnetic RAM or phase-change memory (PCM) – flash is the one that delivers year on year, while its much vaunted competitors have yet to ship in volumes or densities that threaten flash's position, even given its shortcomings. The big problem with flash lies in the way it stores data. Its high density comes from being a one-transistor storage technology. The storage element is a metal gate surrounded by dielectric – a floating gate that sits underneath the gate used to control current flow through the transistor when unprogrammed – this state typically denotes a logic '1'. If you apply a strong electric field across the oxide that separates the floating gate from the transistor channel, electrons are forced through and are trapped on the floating gate. This stores a logic '0' because the threshold voltage needed on the control gate to turn the transistor on increases beyond the normal supply voltage, due to the charge on the floating gate. The shift in threshold voltage is proportional to the stored charge, making it possible to program more than two states into the memory. Lower charges can be used to represent combinations of two or more bits that are read out by detecting the threshold voltage needed to turn the transistor on. Erasing the cell requires relatively high voltages in order to allow electrons to tunnel out of the floating gate (see figs 1 and 2). This is where the drawbacks of flash become apparent. To reduce the size of the memory cell, it is not possible to wire the transistor up in such a way that a cell can be erased individually – usually the substrate is biased during erasure. So, the operation has to be performed en masse within isolated blocks. The erase-block size can be on the order of tens of kilobytes. As a result, data needs to be cached in the system before the data can be rewritten. Over time, the oxide through which electrons are expected to tunnel degrades and charge ends up trapped within the dielectric. This increase in trapped charge tends to speed programming, but slow erasure – when a block fails to erase completely within a given time, the entire block is marked as worn out and retired. Since flash was first introduced, its erase endurance has improved to the point where each block can, on average, go though millions of cycles. However, this endurance drops dramatically with the multilevel cell memories used in cheaper mass storage devices. This trend is not being helped by scaling. Few systems will write to memory evenly. The chances are that hotspots will develop caused by the frequent storage of data that determines the state of the system at any time. Other blocks will contain rarely modified code or data that stays constant for much longer periods. The hotspots will tend to wear out much faster. As a result, flash file systems are commonly used to redistribute commonly accessed regions of memory to different blocks over time. This so-called wear-levelling strategy helps to minimise the number of blocks that need to be retired during the lifetime of the system. A further problem, particularly for high-reliability systems, is that programming a bit is nowhere near as simple as sending an electrical pulse to the target cell. It takes numerous cycles to distinguish reliably between the 1 and 0 state, even in a binary cell. There is the danger that programming one cell can disturb the state of adjacent cells – they wind up losing some of their charge through tunnelling. Electrons can leak out simply from defects in the tunnel oxide so that, over time, the cell loses a percentage of its programmed charge even if the memory is not being used. Although materials science has improved, newer memories have a smaller charge-storage volume, making it easier to lose enough electrons to cause a misread. In a binary cell, this need only be about a quarter of the programmed charge. Naturally, multilevel-cell memories are even more sensitive to these effects. Kinang Kim of Samsung, the world's largest maker of memory chips, estimated at the International Electron Device Meeting (IEDM) in late 2010, that flash would reach its ultimate limit by the 10nm generation. While that is still almost a decade away, the tricks needed to get flash to perform well beyond 20nm could become prohibitively expensive. Manufacturers are reacting to the gradual loss of predictability in cell programmability and storage. For example, companies such as Micron Technology and Toshiba have included error correction code (ECC) circuitry in some of their memories. Historically, volatile memories such as static (SRAM) and dynamic RAM (DRAM) have used ECC to guard against soft errors caused by radiation – an alpha particle smashing into a cell can flip a bit completely, potentially triggering an error. Flash does not have anywhere near the same susceptibility to radiation-induced errors. In flash, ECC protects the stored contents from progressive degradation. Using ECC makes it possible to hide borderline cells without having to mark them as unusable. As ECC logic is faster than providing more time for read lines to settle and writes to be made – which generally require multiple passes – it can be used to squeeze faster access times out of the memory array. Access time can be an issue when working with flash. During the 1990s, the difference between flash and DRAM accesses were not that great. But a gap opened in the past decade; a factor that is, somewhat ironically, favouring the use of slower, rather than faster, flash architectures. Originally, flash memories were designed to act as near drop-in replacements for UV-erasable read-only memories. These NOR-architecture flash memories have conventional, byte-addressable memory structure. Until the end of the 1990s, NOR was by far the most prevalent flash memory technology. However, NOR arrays require one contact between metal and the diffusion areas that contain the source and drain every two cells. This contact area consumes almost as much space as the memory transistor itself. Manufacturers came up with the NAND architecture as a denser alternative. This packs transistors together in a string – usually 16 of them between the metal contacts – which are then organised into larger blocks (see fig 3). The names NOR and NAND come from the similarity of the memory structures at a circuit level to multiple-input NOR and NAND gates in NMOS logic circuits. Because programmed cells in a NAND string will block current under normal operation, unselected transistors in the string are generally driven with a higher gate voltage than that used to select the target memory cell – this allows them to act as pass transistors. The consequence of this design is the read signal is much weaker because it has to pass through multiple transistors. NAND memories are optimised for reading data in large blocks. Successive word lines can be selected in a string and the contents of those bits from adjacent strings will be presented to the sense amplifiers connected to the block in parallel. Typically, the contents are passed to an on-chip SRAM buffer to make access from the host processor easier. As a result, the initial access time is long – microseconds, rather than nanoseconds – but successive bytes can be read from the buffer quickly. The long random access times of NAND flash tend to force designers to copy large blocks of data from the non-volatile array into much faster DRAM or on-chip SRAM. However, this is increasingly the case for program code stored in NOR flash because 32bit microcontrollers have improved cycle times while the non-volatile memory's random access time has remained more or less static. As NAND is denser and cheaper, it can make more sense to use that instead of NOR if program code can fit into a volatile-memory buffer. NOR remains the main choice for embedded flash, however. It generally demands the fewest process modifications when implemented on a logic-oriented CMOS technology. And copying the contents of on-chip flash into local SRAM or embedded DRAM is a costly option. So some microcontroller suppliers have developed ways to hide the access time penalty incurred by flash storage. An example is the read buffer developed by NXP and now supported by ARM in its microcontroller-oriented processor cores. The memory accelerator module (see fig 5) splits the program-storage flash into two banks, switching between them every few cycles. Instructions are pulled into a volatile prefetch buffer on the basis that the processor will often executed them in sequence. Because tight loops are common in most code, a second bank of volatile memory holds the last few instructions executed. If the processor hits the bottom of a loop, there is a reasonable chance that the instruction at the beginning of the loop is sitting in the branch-tail buffer. If you add a second transistor to a NOR-style flash memory cell, you can implement bytewise erasability to produce the modern form of EEPROM. Older EEPROMs tended to use hot electron injection for programming – the same technique as that used in most modern flash devices. However, more recent EEPROMs tend to use Fowler-Nordheim tunnelling for both programming and erasing cells. The tunnelling scheme takes longer, but causes less overall damage to the oxide. As most EEPROMs used today are serial devices which store small pieces of infrequently updated data , the slower programming speed is rarely an issue. Very often, the contents of serial EEPROM are cached in volatile memory in the system, so the read-out interface speed is more important. Vendors have implemented fast versions of the venerable SPI port to improve boot times in these devices. The same serial interfaces are being applied to a potential replacement for both EEPROM and flash: PCM, which also has the benefit of being bytewise-rewriteable. As PCM makers have had difficult in increasing the density of the arrays, it should be easier to replace serial EEPROM in systems than bulk flash. However, a small number of mobile phones have shipped with higher-density parallel-interface PCM devices in place of flash. Endurance issues also plague PCM. So it is not like working with a non-volatile SRAM – wear-levelling algorithms are still required to ensure that hotspots do not form in the memory array after successive writes. Memory controllers for PCM can improve the lifetime of the cells by only writing cells that need to changed: comparisons with the previous contents are performed to avoid rewriting the same value back to a cell and reducing its lifetime unnecessarily. Ferroelectric memory (FRAM) has the opposite problem. Reads are more problematic than writes when it comes to calculating endurance. In microcontroller-based systems that need very low-power operation and do not need high density, ferroelectric memory is emerging as an alternative to flash. FRAM has a massive advantage over flash in terms of the energy and time needed to write each bit. The core of the FRAM cell is a capacitor that contains a ferroelectric material that can be flipped from one state to another that remains set even when power is removed from the circuit. The FRAM is not as dense as flash because each memory cells needs both a capacitor and a transistor. But the energy needed to flip the state of the material is at least two orders of magnitude greater than that needed to write a bit to flash. The write, because it is a simple bit flip rather than a laborious processes of pulsing and checking for charge storage, is also much quicker. Unfortunately, as with all non-volatile memories, there is a catch. Reading the state of the ferroelectric capacitor is destructive, as it is with a DRAM. So, each bit needs to be rewritten into the cell if it previously stored a '1'. Unlike DRAM, ferroelectric cells suffer wear on each write. The endurance is six or more orders of magnitude better than flash. The problem is that, in practically all electronics systems, reads from memory are far more common than writes. This makes the endurance of a product expected to last more than a few years a key issue for FRAM. As a result, products – such as the FRAM-based microcontrollers launched recently by Texas Instruments – are intended for low duty-cycle environments where the processors spends much of its working life asleep, only waking up to take and store sensor readings. As the read/write ratio is more balanced and low energy consumption is vital in these systems, FRAM keeps its advantage over flash. Twenty years ago, FRAM looked set to be the memory of the future. But problems with manufacture made it tough to scale densities and its endurance issues will limit its use outside applications such as energy meters. However, in those systems, FRAM looks to fulfil its promise as a unified memory: suitable for code and data storage. Elsewhere, we are stuck with the prospect for the foreseeable future of combining flash with volatile memory technologies and dealing with its many shortcomings. But it remains the undisputed leader.