How Exchange's new ever-expanding archive mailbox works

How Exchange's new ever-expanding archive mailbox works

Microsoft’s announcement of their intention to enable a “truly bottomless archive” for Office 365 enterprise tenants might have created some curiosity as to how Microsoft intends to provide this feature. Apart from the massive increase in disk space, that is, because we all know that Office 365 wants to provide users with massive repositories.

What’s more interesting is how the simple archive mailbox has taken on a new shape to allow it to keep pace with user demand. The idea is simple – as users store more and more items in their archive mailbox, Exchange Online automatically expands the archive to accommodate everything that’s thrown at it through user activity, importing PSTs, or automated archiving via retention policies. And it is not that a single archive mailbox will grow to 1 TB or beyond. As we’ll see, the solution is based on a new take on existing technology that might just surprise you.

Consider how modern public folders store their information in public folder mailboxes. Each mailbox has a copy of the public folder hierarchy but there’s only one primary hierarchy that is writeable and keeps track of the location of public folders. From a user perspective, you just see public folders and don’t have to worry about what mailboxes need to be accessed to get to whatever public folders you care to use during a session. It just happens.

The hierarchy knows what public folder mailbox holds a specific folder. Thus, when a user request comes in to access a public folder, the hierarchy can redirect the request to the correct mailbox. If a change happens, like a public folder being moved from one mailbox to another, the hierarchy records that fact and everything continues to function.

As it happens, Office 365 includes an auto-split feature that automatically creates new public folder mailboxes when one of the existing set grows too large (more than 50 GB). After a new public folder mailbox is created, the public folders held in the original mailbox are split across the old and the new mailbox. The same thing happens if an administrator creates a new public folder mailbox and decides to move some public folders to that mailbox.

Now think about archive mailboxes and imagine that exactly the same thing might happen for a user’s archive. They start off with a single archive mailbox and after that mailbox grows to 50 GB, Exchange automatically adds a new mailbox to form a logical set or chain composed of two mailboxes. If another 50 GB is added, a third mailbox joins the set, and so on. The mailboxes in the archive set don't need to be in the same database. Performance is assured because an individual mailbox stays well under the 100 GB limit supported by Exchange and the user sees nothing except a single seamless archive.

When a new mailbox is added to the archive set, Exchange automatically splits some of the content from the original archive into the new mailbox to distribute the load across the mailboxes in the archive set. The redistribution of folders across available archive mailboxes is performed by the Mailbox Replication Service (MRS) using the same kind of incremental synchronization to transfer content to the new location as happens when mailboxes or public folders are moved. 

Some magic is necessary behind the scenes. Today, an archive mailbox is linked to its primary mailbox by a GUID stored as a mailbox property. Exchange can take the GUID and find the archive in the database where it is stored. The bottomless archive is implemented by replacing the single GUID that connects the mailbox to the archive with a linked list of GUIDs. Each of the GUIDs points to a separate archive mailbox of up to 50 GB. The set of archive mailboxes is treated by Exchange as if it were a single large mailbox. Clients also see the single entity and updates are not required for Outlook or Outlook Web App because as far as the clients are concerned, they ask the Exchange Store for some archive information and it’s up to Exchange to locate and return the requested data. All of the complexity of figuring out which of the linked mailboxes holds the target data is taken care of by Exchange.

You can see the details of the GUIDs by running the Get-Mailbox cmdlet to examine a mailbox’s properties. If you look at the MailboxLocations property, you’ll see something like this:

Get-Mailbox –Identity TRedmond | Format-List MailboxLocations

MailboxLocations : {1;0370f354-2752-4437-878d-cf0e5310a8d4;Primary;
eurprd04.prod.outlook.com;353ce1b5-5044-4974-93f0-7b6f4a54edf8, 
1;afc1e472-0826-498e-b990-85de223e809d;MainArchive;eurprd04.prod.outlook.com;
353ce1b5-5044-4974-93f0-7b6f4a54edf8}

The information about mailbox locations reported by Get-Mailbox is broken down into two sections; one for the primary mailbox and the second for the archive. Only a single mailbox and archive are found here. If other archive mailboxes were present, they would be listed in the archive section as mailbox 2, 3, 4, and so on. The information for the two mailboxes is as follows:

Primary mailbox:

  • The ExchangeGUID (which ties the mailbox back to a user account)
  • “Primary” to indicate that this data refers to the user's primary mailbox.
  • If Exchange Online is used (as in this example), the name of the forest in which the mailbox database is located. This value is blank for on-premises deployments.
  • The GUID of the database where the mailbox is located

Archive:

  • The ArchiveGUID (which is only present when a mailbox is archive-enabled)
  • "MainArchive" to indicate that this data is the first in an archive set. If other archive mailboxes are present, they are tagged as "AuxArchive"
  • As noted above, the name of the Exchange Online forest holding the archive mailbox
  • The GUID of the database holding the archive.

In this case, the mailbox and the archive are located in the same database (you can confirm this by using Get-Mailbox to examine the Database and ArchiveDatabase properties). And, as you'd expect, they are in the same Exchange Online forest. Interestingly, both the primary mailbox and the archive are listed in the MailboxLocations property, which then raises the question whether Microsoft will use the same approach to create expandable primary mailboxes in the future.

Splitting up public folders and tracking where content is actually located by reference to a list of locations is a simple and proven scheme that works for public folders. It’s reasonable to expect that it will work for expandable archives.

The feature is now in test within a small set of Office 365 tenants and is also scheduled to appear in Exchange 2016, but it remains to be seen whether on-premises administrators will welcome the additional expense for the required storage used by the bottomless archives.  Many advantages, especially in terms of compliance, are gained if you can swap PSTs for archives, but lots of storage could be consumed.

Some kinks still need to be worked out. For instance, you can’t move one of these archives to an on-premises server as it would force Exchange to package up all the archives in a chain to form a massive archive mailbox before it could be moved to an on-premises Exchange 2010 or Exchange 2013 server. Exchange 2016 servers understand the new format, so it’s much easier to move to these servers.

Microsoft is encouraging users to move their data to the cloud by capturing PSTs and importing them into Office 365 and it clearly makes a lot of sense to have an expandable archive ready to handle all of that inbound data. Whether the same need exists for on-premises Exchange is unclear. No doubt we shall hear in time!

Follow Tony @12Knocksinna

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish