NVMe revolutionised flash storage. Previously flash SSDs could only utilise existing storage transport protocols such as SATA and SAS, which were designed for the much lower throughput and input/output (I/O) rates of magnetic spinning disk media.
NVMe brought much greater ability to handle more bandwidth and more queues within it, which resulted in a performance boost of several 10s of xs.
NVMe 2.0 does not offer the kind of earth-shattering step up like that over NVMe vs SAS and SATA but is so-designated due to the amount of enhancements. These include:
- Support for rotational media, ie HDDs;
- Zoned namespaces that will further optimise use of high capacity QLC flash;
- Use of a key: value command set to numerous layers of translation required to map to physical drive addressing, and;
- Customer ability to configure NVMe endurance groups, which can allot capacity groupings to different storage consumers by type.
In other words, NVMe 2.0 ratifies support for spinning disk media (HDD). The obvious question would seem to be, why?
The idea is that NVMe can become a common transport layer for storage I/O across all types of media with customers able to incorporate HDDs into the same infrastructure and with a common architecture across all drives.
And while HDDs are largely superceded for performance work in the datacentre, there’s no way hard drives are going to disappear for some time especially because they can offer high capacity – up the 20TB region and with more than half a TB of throughput in the case of Seagate’s Mach.2 – even if they can’t match flash for random IOPS.
NVMe zoned namespaces
ZNS will allow for more optimal use of QLC flash, which is the highest in capacity terms of the flash generations but lacks in terms of lifespan. Zoned Namespaces will cut down on the amount of wear suffered by NVMe-connected drives – due to lower levels of write amplification – and so lengthen the life of QLC and allow it to be used where previously more-long lasting flash was needed.
ZNS also means drives need less in terms of over-provisioning while DRAM usage in the system is also cut because the work of the flash translation layer – which handles translation to block addressing – is reduced because whole zones are managed instead of 4k blocks.
NVMe is a way of deploying the access methods best suited to flash storage media. Previously, SSDs had pretty much adopted the use of SCSI and ATA as ways of addressing drives, and all inherited from the era of spinning disk HDDs.
Zoned namespaces (ZNS) is one of a number of further steps NVMe is taking away from that history. Having said that, it is actually derived from a technique used in Shingled Magnetic Recording method employed in some hard drives that sees tracks overlapped on HDD platters.
NVMe Key: Value command set
Key: Value seems to be everywhere – from Javascript data structures to NoSQL databases – and NVMe 2.0 will use that method of storing and recalling data to supercede the use of block addressing. It’s as simple as it sounds, with data stored as unstructured data and a value of between 1 byte and 1MB mapped to a key of between 1 and 32 bytes.
The NVMe key: value command set does away with two layers of mapping between application call and physical media.
In block storage triple mapping occurs to the file system, then to the logical block address, and from the LBA to the physical address. Key value uses a single mapping table.
NVMe key: value brings a claimed increased number of transactions per second, decreased write amplification and lower latency.
NVMe endurance group management
Endurance Groups and NVM sets first came along in NVMe 1.4 in 2019, but there were limits on what customers could do to configure them. They had to be hard-coded in drive firmware or needed vendor-specific commands.
NVMe 2.0 allows customers to allocate Endurance Group and NVM sets with configurable parameters that provide some flexibility to isolate the I/O performance and wear-levelling effects of different users on shared drives or arrays.