As is typical for most new technologies, enterprises have been slow to adopt the latest versions of the NVMe specs, with hyperscale cloud platforms responsible for much of the adoption to date. But enterprises are starting to catch on.
NVMe 1.4, released in July, includes hyperscale features for isolation, predictable latency, and write amplification. But the higher performance and lower system requirements of NVMe as the interconnect for flash drives and arrays are driving acceleration of the spec’s use in the enterprise, with NVMe shipments (measured by gigabytes) predicted to exceed combined SAS and SATA this year.
The latest spec for NVMe over Fabrics (rather than directly attached in servers), NVMe-oF 1.1, also released in July, adds support for TCP/IP to the existing Fibre Channel and RDMA options and includes enterprise-focused features for QoS and management. Mainstream storage, network, and management providers like Cisco, Dell, Intel, and Mellanox are ready to support NVMe-oF with standard drivers and even hardware acceleration for speeds up to 200Gb/s, making it a mature standard for deploying disaggregated software-defined scale-out storage in existing TCP/IP environments.
NVMe over TCP is the most significant feature in NVMe-oF 1.1, Henry He, director of product management for Virtana (formerly Virtual Instruments), told us. “NVMe can now be extended across your entire intranet, where previously it was localized to data centers or required specialized hardware. If you want Fibre Channel, you can use that, but if you just want plain old vanilla TCP, you have that too.”
Detangling Your Flash Mess
Flash storage has major advantages for workloads, from VMs to databases to big data and machine learning, and NVMe prices are starting to reach parity with SATA SSDs. Meanwhile, SCSI SSDs can’t match the speed and latency of NVMe-oF.
Along with the continuing increase in SSD capacity – which doubles about every six quarters – grows a confusing range of flash storage technologies with different characteristics: from lower-priced higher-capacity but lower-endurance QLC to persistent memory like Intel Optane and Samsung Z-NAND, which falls somewhere between DRAM and flash. There’s even battery-backed DRAM. This has increased the complexity of managing flash storage in the data center, Amber Huffman, an Intel fellow and president of the NVM Express standards organization, told Data Center Knowledge.
In NVMe 1.4, “NVM Sets” group flash devices by latency and endurance, even if a device includes multiple types of storage. “NVMe has always treated flash as just a logical set of blocks, and you don't know what that attaches to on the backend,” Huffman pointed out. “Am I doing reads and writes to the same NAND location and creating a bottleneck? We’re getting towards the concept of breaking up storage devices in a logical fashion but still getting the benefit of having an abstraction by offering finer-grained capabilities for QoS.”
Exposing the different flash characteristics to the host means you can place workloads intelligently, making performance more predictable by marking which IO needs higher priority, assigning heavy write activity to flash with higher endurance, while using lower-endurance higher-capacity flash for read-heavy workloads.
“IO Determinism” makes performance and latency more predictable at scale. Flash read times can vary dramatically (from microseconds to seconds in some cases), so you can now ask a device if it can deliver within a specific time window and if not send the request to another device on the fabric.
Automated Discovery, Intelligent Routing
The fabric can also take on more discovery and intelligent routing work, making your infrastructure smarter and adding more dynamic IO queue resource management – but without adding overhead for users who just attach storage directly, via PCIe.
Adding extra ports and components or disconnecting devices doesn’t drastically change your storage infrastructure. Previously, the storage host wouldn't know about the changes unless you restarted the whole discovery process. Now, capabilities can be discovered automatically and dynamically. “In storage, what you want is fewer outages and less impact on applications, and this makes the whole system more manageable and more efficient,” Virtana’s He said.
That becomes more important as the size of NVMe storage networks goes from dozens to thousands of devices, Huffman noted. “As we move into pooled storage scenarios, you start to have many ways that you might have gotten to a device, and you need to understand which is the best path to this device, because they might not be created equal: I might be going through in a much slower, longer path.”
Resiliency and Recovery
There are also new resiliency features and more options for when devices fail.
Reliable as flash storage is, sometimes data is lost during writes. The new verify feature reads data immediately after it’s written, as an extra check on data integrity.
Standardised, persistent logs deliver more information about internal error states, which could feed into monitoring software to help distinguish critical and non-critical errors or allow you to recover data from a drive that failed for writes but can still be read. It will also help enterprises discover whether they need to sanitise failed drives before disposal (in case data can still be read for them), as well as making it easier for vendors to tell the difference between media failure and firmware bugs.
The new rebuild-assist option should reduce data loss in cases of partial failure. Drives will detect media failure and the drive controller will notify the host, which will attempt to rebuild the data from other copies. You can also use this to get a clearer picture of drive life and replace drives pre-emptively before they fail.
“You are making the entire infrastructure a bit smarter and more resilient and more tolerant to potential failures,” He suggested.
Better for Security
Many of these capabilities have been available in storage management software or even as proprietary features on some drives. Having them in the standard and available with the standard networking stack doesn’t just make them ubiquitous – and basic rather than premium features – it helps with the security lifecycle.
“One of the key things I'm hearing from data center customers is that they really want standard drivers,” Huffman said. “They don’t want boutique drivers. They want the inbox Linux driver, for example, and maybe they add a few capabilities, but when somebody discovers a security vulnerability, they want the standard driver, so they can update quickly.”
Put it all together and NVMe enables you to create a high-capacity storage fabric that can sustain high throughput and IOPS at a competitive cost by mixing different types of flash and distributing the right parts of the workload to each. It also enables you to build a flexible storage architecture that’s not a fabric but can still be ready for future changes and devices you haven’t yet planned for, Huffman suggested.
“What we’re hearing from enterprise customers is that IO connectivity is really expensive,” she explained. “When they think about the IO bandwidth speed and the number of channels they need to provide, if they’re not doing a fabric, they’d rather be on PCIe, where they’re not dedicating the connectivity to storage.”
NVMe now gives them the flexibility to attach an accelerator to train a machine learning model, for example, or computational storage, when they need it. “When they’re building a system now, they don’t know how quickly the world is going to change for them, or what they’ll want to attach in two years’ time. With NVMe, they get all that flexibility.”