Using intelligent capture and analysis tools to eliminate PSTs

Using intelligent capture and analysis tools to eliminate PSTs

Returning to the topic of PSTs, but only because you can never have enough of these pestilent files, after I wrote about Microsoft’s new Office 365 Import service, I had the chance to chat with some of the companies who are dedicated to tracking PSTs down. These companies build tools to find and fix PSTs lurking in the dark corners of drives scattered around an organization to get the files into the necessary shape to become a candidate for ingestion into Office 365 or on-premises Exchange.

All acknowledge that Microsoft provides customers with a free PST Capture tool. And all agree that the free tool marks the lowest common denominator for what you’d expect in terms of the ability to ferret out hidden PSTs and transform the files. In short, if you pay zero for software, it’s unfair to expect that it will be very sophisticated.

I think this is a fair position. Microsoft has updated the PST Capture tool once since it was acquired, but I get no sense that Microsoft considers the tool to be anything other than something which allows them to tick a box when it comes to reassuring customers that Microsoft can help to eradicate PSTs. Certainly, there hasn’t been any great attention paid to PST Capture since it was originally acquired by Microsoft to support the introduction of archive mailboxes in Exchange 2010. And it never lived up to the “revolutionary” label promised for the tool in 2011.

You can certainly use PST Capture to search for PSTs and there are many articles, most written in the initial flush of enthusiasm after the tool was released (like this example from msexchange.org), to tell you how to use it. But I reckon that you’ll conclude that PST Capture requires a lot of manual attention to meet your needs.

Third party ISVs concentrate on tools that handle edge conditions, speed detection, and add automated workflow to make sure that PSTs can be found, copied, and processed without requiring a huge amount of time from administrative staff and users.

Edge conditions include things like being able to process password-protected PSTs – and not just those that are protected with the easily cracked compressible encryption that is used today. There are still a few PSTs generated by Outlook 2003 that use the “high encryption” method that is much harder to deal with.

Detection means being able to find PSTs no matter where users have squirreled them away. Invariably, this means that you have to deploy an agent onto user PCs to ensure that every drive is examined so that all PSTs can be harvested. Speaking of which, I was fascinated at some of the data about PST collection reported by the vendors. Look at the data shown below that’s taken from a PST acquisition exercise performed in a well-known major company.

The two items that stand out for me are the 922 PSTs uncovered for one user and the 343.4 GB of data found in 314 PSTs for another. It’s fair to ask how these situations might come about as you might not be able to understand how anyone could manage 922 PSTs. The answer lies in personal work practices and a distrust of IT, perhaps because of restrictive mailbox quotas or poor server reliability. Some folks create PSTs to archive items for individual projects, some create PSTs to archive items on a monthly basis, while others have their own weird and wonderful logic for creating new PSTs. The point is that these things happen in the wild and companies probably don’t realize that quite so much data is actually stored in PSTs on user-controlled storage. Scanning tens of thousands of PCs in a large company can uncover hundreds of terabytes of PSTs. All of that data are invisible for the purpose of enterprise search and compliance.

Finding so many PSTs and figuring out what to do with them can take a huge amount of administrator time. Tools that can scan for PSTs, copy them to a central holding area, and then prepare them for further processing by running scans to fix item-level corruption (multiple runs of the SCANPST utility or some proprietary code might be required) can reduce the required time, especially if you can schedule workflow tasks to scan, gather, and fix PSTs on an automatic basis. Add in optional processing such as deduplication of data across a set of PSTs, and you can see how third-party tools justify their license fees.

Taking the example of a company that has uncovered hundreds of terabytes of PST data, it’s likely that a lot of duplicated information exists in those files. Remember that a PST is a personal file, and if a message is sent to 100 users, it might result in 100 separate copies being stored in 100 PSTs. Deduplication is important if you plan to import the PSTs to Office 365 or Exchange on-premises because the last thing you want is to have to process massive chunks of duplicated information, especially if the data is going to be shipped across an Internet connection.

The most important thing that I learned from talking to the ISVs is that they have huge expertise and experience of dealing with the vagaries of PSTs and the many ways that people use these files. That experience ends up in their products. The advice that you can get from an ISV before starting a PST acquisition project will save time (and money) and usually results in a better outcome.

If you’re interested in using the new Office 365 import service and intend to gather user PSTs from near and far within your organization, take the time to go and talk to real experts before starting. I’m sure that the folks at QuadroTech (PST FlightDeck), Nuix (Intelligent Migration), TransVault (Migrator), Sherpa Software, and Archive360 (to name just a few of the ISVs working in this space) will be happy to talk to you.

And perhaps before focusing on the Office 365 Import service as the only way to transfer PST data to Office 365 mailboxes, have a look at what the ISVs can offer in this space. QuadtroTech caught my interest when they announced results of tests that showed that their Advanced Ingestion Protocol (AIP) is able to process PST data six times faster than the Office 365 Import service. In addition, their ArchiveShuttle technology was able to do a better job of moving data into Office 365 because fewer "bad items" were dropped.

According to QuadroTech, the Office 365 Import service depends on the New-MailboxImportRequest cmdlet to import PST data, a cmdlet that is only available to on-premises Exchange 2010 and Exchange 2013 servers (and, I believe in dedicated instances of Office 365). The cmdlet is controlled by the Mailbox Replication Service (MRS), but the MRS logs that detail any problems found with PST items when processing are not exposed to administrators by the Office 365 Import service, so you never know if items fail to be ingested unless you compare data before and after. I've asked Microsoft to comment on these claims but have heard nothing back to date.

In any case, the reported deficiencies of the Office 365 Import service appear to be a similar case to the PST Capture Service, which is free to all, but has some shortcomings. If you pay extra to purchase a third-party product that specializes in an area, you get more features and functionality for that investment.

My discussions with ISVs working in this space proved once again that these companies perform an extremely valuable service to the ecosystem by filling gaps left by Microsoft, When you decide about the approach you take to eliminating PSTs, take the time to investigate what is available before making a commitment. You know it makes sense.

Follow Tony @12Knocksinna

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish