Last week I discussed a new data import service that Microsoft has introduced to allow Office 365 customers to send PSTs to Microsoft datacenters for ingestion into the cloud. This week I sat in on an archiving session given by Dheepak Ramaswamy at the Ignite conference and learned some of the thinking and detail around the new capability.
There’s no doubt that Microsoft has Symantec Enterprise Vault (EV) in their cross-hairs. For years, Microsoft has muttered that the “stubs” created by EV when content was exported from mailboxes to the EV database compromised the integrity of the Information Store and prevented customers making full use of the compliance features built into Exchange. That argument was less powerful when on-premises Exchange wasn’t so good at compliance and ran on expensive storage, meaning that it was a good thing to offload messages and documents into another repository that supported full-feature eDiscovery.
The world is different now. Exchange boasts a wide range of compliance features in both on-premises and online versions and the latter has massive low-cost storage available to hold literally as much data as you care to throw at the service. Microsoft views this as an opportunity to “bring data home”, much like a shepherd collects their lost sheep.
The procedure requires companies to collect PSTs on 3.5 inch SATA 4 TB drives (6 TB drives will be supported soon) following by a dispatch to the closest Office 365 datacenter. The PSTs, which are protected by BitLocker encryption of the drives and can be optionally encrypted before being placed on the drives, are then connected into racks and the content is uploaded in a storage container in the Azure blob store. No Azure subscription is required as this is created automatically to allow transfer to occur. In addition, you don’t have to pay for the Azure storage used to hold the PST data while it waits to be processed. The drives are returned to the sender after the PSTs have been uploaded to Azure.
Shipping and upload takes a couple of days, after which the tenant administrator can start off jobs to transfer the PST data to mailboxes. Microsoft has no access to the data while it is being processed. Everything on the drives is encrypted and the tenant retains control of the BitLocker key throughout.
The data ends up as a blob in Azure and is available for up to 90 days. During this period the tenant administrator can transfer information to Exchange Online as they wish, controlling the process by providing Office 365 with a mapping file to connect the PSTs that have been uploaded to Azure with target mailboxes.
During the transfer to Exchange, the PST data is validated and scanned to remove corrupt item (think of running the SCANPST utility several times). Very corrupt PSTs are dropped as there’s no point in introducing bad items into mailboxes. Azure and Office 365 are connected on a common Microsoft datacenter backbone so transfer rates are reasonable with an expected range of 250 GB to 1.3 TB per day. The exact rate depends on the number of target mailboxes. If you point a massive PST at one mailbox, you create a narrow pipe and transfer will be slower than if a job processes multiple PSTs, each targeted at a different mailbox.
Data can be directed to either primary or archive mailboxes. It’s best to use archive mailboxes for one simple reason: archive mailboxes are only ever accessed when clients are online so after the data is transferred to the archive, it is immediately available. By contrast, if you transfer data to primary mailboxes, that data needs to be synchronized to local clients for people who use Outlook and thus cause a “synchronization storm”.
As a tip, if you want items imported into the root of the mailbox, specify "/" (without the quotes) in the TargetRootFolder column of the mapping file. Otherwise everything will be imported under the "Imported" folder.
Although EV seems to be the prime target for now, Microsoft has signed up archive specialists TransVault and Nuix to provide tools to help companies generate PSTs from other third party archive systems.
For various reasons, the Office 365 import service is only supported for tenants based in the North American, European, and Asia-Pacific datacenters. Tenants in China, Australia, Brazil , and Japan will have to wait for the service to be introduced for those regions.
The Office 365 import service also supports a direct upload option to allow tenants who don’t have terabytes of PST data to upload PSTs over the network. Once uploaded, the same ingestion process is used to move data from Azure into Exchange Online.
Documentation for both the drive shipping and direct upload import options is now available on TechNet.
SharePoint migration to Office 365 will be enhanced soon when the Import service is upgraded in about a month to use a new API to create a content file and manifest from on-premises document libraries that can be shipped to a Microsoft datacenter to go through the same ingestion process. Of course, in this case, the target will be SharePoint Online document libraries rather than mailboxes.
Microsoft also announced that they will soon enable unlimited quotas for archive Exchange Online mailboxes. Up to now, a default quota of 100 GB was assigned to archive mailboxes. This amount could be increased but it required the user to request Microsoft to assign additional quota. The difference now is that Microsoft is going to remove the quota altogether and users will be able to stuff as much data as they want into their archive mailboxes. The race to the first 100 TB mailbox begins!
Seriously, removing the quota for archive mailboxes makes the process of data ingestion much easier all round. All of those massive PSTs that are about to be shipped to Microsoft on SATA drives will find a nice online home in Exchange Online, available for as long as the user wishes and indexed and discoverable for compliance purposes too.
Follow Tony @12Knocksinna