Microsoft Azure hardware on display at the OCP Summit 2019 Yevgeniy Sverdlik
Microsoft Azure hardware on display at the OCP Summit 2019

How the New Azure Ultra Disk Storage Delivers On-Prem Latency in the Cloud

Designed for mission-critical applications like SAP Hana, it may suck more workloads out of enterprise data centers and into Azure.

At last year’s Microsoft Ignite, where the company first demoed Ultra Disk Storage, Azure CTO Mark Russinovich showed an unannounced drive running at 250,000 IOPS and 1 millisecond latency. The high-performance cloud storage service the company “launched” last week, and although its throughput isn’t quite as high, it’s enough to suck more core applications out of enterprise data centers and into Microsoft’s hyperscale cloud platform.

The historical reluctance to move to the cloud tier-one workloads, such as SAP or mission-critical databases behind key business applications and systems of record, has been as much about performance as it has been about security.

The new Azure Ultra Disk Storage managed disks use NVMe to offer sub-millisecond latency for I/O-intensive workloads like SAP Hana, NoSQL, OLTP databases, and other kinds of guaranteed-write sequential workloads and transaction-heavy systems that have been moving into the cloud, but slowly.

These are 4K-native data disks that initially only work with the premium storage DSv3 and memory-optimised ESv3 VM instances. ESv3 is what you’d pick if you were building relational database servers, large caches, or in-memory analytics systems with tools like Hadoop, Spark, Hive, Kafka, or HBase. (Azure’s storage-optimized instances already have directly mapped local NVMe storage.)

“Azure Ultra Disk Storage was designed to be one thing and one thing only: high-performance block storage,” Michael Myrah, Azure principal program manager, told Data Center Knowledge. “It leverages a direct channel to the underlying block storage and bypasses any unnecessary software layers. It offers sub-millisecond latency at extremely high IOPS, matching the performance characteristics seen in on-premises flash arrays.”

According to last week’s confusingly titled blog post by Microsoft “announcing general availability” of the service, it is now available by request in East US 2, North Europe, and Southeast Asia Azure cloud regions. But “the general availability price” kicks in in October, which appears to be when Microsoft expects to launch it as a typical cloud service.

Some customers have been already been using the service for workloads like PostgreSQL. Financial trading education platform Online Training Academy is using it to scale applications to the cloud, and Tokyo-based gaming company Sega said in the announcement that the service allowed it to “seamlessly migrate from our on-premises data center to Azure.” (Which may explain why Asia is one of the initial regions.) Microsoft has yet to say how much it costs.

“Azure Ultra Disk Storage's high throughput and high I/O capabilities were designed for supporting data-intensive workloads like SAP Hana in the cloud,” Pund-IT principal analyst Charles King told Data Center Knowledge. “That should make the new service particularly interesting to Microsoft's enterprise clients and could also increase the momentum of mission-critical application migrations to Microsoft Azure.”

You can pick capacity from 4GB up to 64TB and configure bandwidth and IOPS separately from capacity, from 256KB/s up to 2GB/s throughput and from 300 IOPS/GB to 160,000 IOPS per disk. That means you can achieve the maximum IOPS for a virtual machine with a single disk rather than having to stripe multiple disks (but you do need to set the disk IOPS below the VM IOPS limit). You can also dynamically tune performance without detaching the disk or restarting the VM (although it may take up to an hour for performance settings to change).

That also comes with high durability, because it’s based on Azure’s Locally Redundant Storage technology, which saves three copies of data in three different racks in the same availability zone. When an app writes to an Ultra disk, doesn’t get confirmation that the data has been saved until it’s been replicated into the LRS system.

A Flash Array in the Cloud

The Ultra disks get such low latency by using a new native kernel-mode block storage service called Direct Drive. Even with a Premium SSD managed disk in Azure, before they hit physical storage devices, reads and writes go through the user-mode REST Azure Blob system, including on their path on-server SSD storage cache and multiple layers of servers. That causes performance and latency issues for databases and other data-intensive workloads running in VMs. With Direct Drive storage operations go directly from the kernel-mode virtual disk client to storage servers in the Ultra Disk Storage cluster without using the Azure Blob storage cache. (Snapshots, availability sets, virtual machine scale sets, Azure disk encryption, Azure Backup, and Azure Site Recovery aren’t currently enabled, but they may arrive later.)

As Russinovich explained at last year’s Ignite, the hosts these disks are mounted on know which servers in the Direct Drive cluster have the relevant pieces of the files. “So [Ultra Disk] can go talk directly to them to write and read data rather than going through the load balancers, and the front end, and the partition table servers that even Premium SSD traffic has to go through.” With those extra steps out of the storage path, Ultra Disk can deliver the kind of performance you’d expect from an all-flash enterprise storage array with non-volatile memory caches, four-copy durability, and RDMA – but in the cloud.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish