It’s a source of some bewilderment to me that so many companies continue to allow so many people to store valuable, sometimes mission-critical information within files that are insecure, prone to corruption, and invisible to the rest of the organization. PSTs are acknowledged to cause real problems for organizations that have to comply with legal or regulatory regimes, yet CIOs and CEOs continue to permit their use at a time when PSTs are unnecessary and obsolete. Better methods exist to store and access information securely. So why are so many organizations that use Exchange content to allow this scourge to continue?
I suspect that the answer lies in inertia. PSTs have been around for almost 20 years and form part of the IT landscape. They exist in a non-threatening way and other tasks have higher priority. There are new systems to deploy, new applications to roll out, new mobile devices to master. And so we allow compliance to slip, inefficiency to prosper, and expose data to unnecessary risk. It’s a mystery.
Here's my challenge to the Exchange community: Let’s get rid of PSTs soon. Not immediately, because that’s impossible, but within the next two years would be nice. Let’s eliminate this vestige of old-school technology and move the data held in PSTs into online repositories to take advantage of the latest cloud and on-premises technologies. And let’s stop pretending that PSTs are necessary for general-purpose storage because they are not. PSTs should only be used to transfer information when no other reasonable method exists, such as to provide data to external legal investigators following an eDiscovery search.
History and background
PSTs use a Microsoft proprietary file format that is designed to hold email items (messages, tasks, calendar items, and so on) and attachments in a local repository. Originally called a personal storage table to differentiate itself with the tables in the server-based Exchange online database, the PST is now often referred to as a personal storage file or personal folders. The PST format is similar to the OST (offline storage table or offline storage file) used by Outlook desktop clients to synchronize slave copies of online folders for local use.
PSTs have been in the spotlight recently for many reasons. On a positive note, the launch of Microsoft’s Office 365 Import Service makes it much easier for Office 365 tenants to move data into online mailboxes and archive mailboxes. On the other hand, the embarrassment suffered by Sony Corporation in November 2014 when hackers penetrated their IT systems and recovered 179 PST files among other data before revealing some saucy details of confidential discussions. Some 73,000 messages from the Inbox of Sony Chairman Amy Pascal were leaked by the hackers and the total cost of the episode was estimated at approximately $15 million.
Anyone who browses the online repository of the information leaked from the Sony PSTs can imagine what would happen if similar data was released about their company. It’s not a happy thought.
The question that arises is why so much confidential data was held in files that are, let’s face it, insecure and prone to failure? PSTs can be password protected, but a quick network search will reveal many utilities that will crack open a PST in seconds. And once a PST is unlocked, it can be opened by any Outlook client. That’s an important fact that is unknown to many of the people who depend on PSTs to hold sensitive data.
The need for PSTs was discovered soon after Microsoft launched the first version of Exchange server in 1996. Up to that time, the average size of an email was small (around 2 KB for a single screen of ASCII text). With the growing popularity of email as a method of corporate communication, message sizes rapidly grew, especially when multi-format attachments were factored into the equation, and the relatively small mailbox quotas assigned to users (often in the 20 MB to 50 MB range) were quickly exhausted, which then required the user to delete items to free space and allow new email to be delivered.
The first PSTs were supported by the original Exchange client (viewer). However, Outlook 97, a client that connected to Exchange 5.0 and 5.5, popularized and accelerated their use, largely because it was possible for a user to control their storage rather than depend on the whim of a server administrator. At that time, it was common for email systems like Lotus cc: Mail, Microsoft Mail, and Lotus Notes to use local storage. PSTs allowed users to move items from their online mailbox into their “personal storage” and so free space up in their mailbox without having to remove items permanently. In addition, Outlook’s archive feature would detect and move old items automatically to the PST and so keep the online mailbox well under its quota. Outlook could also use the PSTs to receive new email, a facility that some used when impossibly small mailbox quotas were assigned.
The problems with PSTs
The original PST used an ANSI format and was limited to 2 GB in size. However, users were unwise to allow their PSTs to grow to such a large size as corruption invariably resulted as the PST approached the 2 GB limit.
Problems with file structure and reliability were addressed by the introduction of the Unicode-format PST from Outlook 2003 onward. The new format was originally limited to 20 GB but Outlook 2010 and later clients increase the limit to 50 GB. The limit can be increased with a registry setting, which can lead to some very large PSTs. The largest I know of is some 62 GB and holds over 123,000 items. This monster is larger than the normal Office 365 mailbox (50 GB), so it cannot be ingested into a mailbox. However, it can be moved to an online archive, which is probably the more appropriate destination for PST data.
Although file corruption is less common with Unicode PSTs than with the ANSI format, the old adage that it is unwise to “put all your eggs in one basket” holds true with PST files. If you must use PSTs, then spread your data across multiple files to restrict the effect of any corruption.
The PST file format is not secret and is fully documented by Microsoft to allow third-party developers to build tools that use PSTs. In many cases, the aim of these tools is to help organizations manage PSTs by:
- Recording their existence on hard disks in user PCs (a process that often requires a software agent to run on the PC) as well as other locations, such as network file shares (where PSTs should never be stored)
- Automatically fixing problems in PSTs (think of running the SCANPST utility automatically but more intelligently because SCANPST removes bad items rather than attempting to fix corruption by patching the MAPI properties for bad items or applying other fixes). It’s surprising how many corrupt items exist in PSTs, especially items created by older versions of Outlook or add-in software for Outlook that don’t match the requirements of today’s software. Users are often unaware that corruption exists because they never attempt to open or access the corrupt items.
- Removing encryption from PSTs to facilitate ingestion into online repositories
- De-duplicating the information held in PSTs. In any collection of PSTs recovered from corporate PCs, you can be sure that a lot of duplication exists. You can go ahead and import everything into Exchange, but it’s better to detect duplicated information and remove it first as this will speed up the ingestion process and reduce the overall cost of the project. In addition, having duplicate copies of documents and other information scattered across an organization makes it very hard to maintain a definitive version, a problem that can become a real issue when dealing with project documents, formal agreements, or other files that are commonly the result of joint authorship.
- Exporting PST data to online storage
Obviously lots of work is necessary to take control over the data held in PSTs within large organizations, but the benefits that can be gained through PST elimination include increased compliance, efficiency (better searching and access to information), and reduced costs (for data storage).
Time to change
At one time, a reasonable case could be made for the use of PSTs. Mailbox quotas were constrained and network access was primitive when compared to today's always-connected model. It therefore made sense for people to keep data local. Those reasons are no longer valid. The average size of a corporate mailbox has expanded dramatically and is now often in the 5 GB to 10 GB range, with even higher amounts allowed for in cloud-based services like Office 365, where the default mailbox quota is currently 50 GB. In addition, archive mailboxes were introduced in Exchange 2010 as a way to provide storage for items that needed to be retained for longer periods without necessarily being in the primary mailbox. I think of archive mailboxes as being a much smarter, more secure, online version of PSTs that users don't have to maintain. Today, the storage capacity of archive mailboxes has been moved to become “practically unlimited” through the introduction of chained 50 GB chunks to form a single logical mailbox. This structure is supported in both Office 365 and Exchange 2016 (soon) and provides a way to allow users to store everything they have in PSTs today online.
It's also true that the availability of networks around the world has improved to a point where online access has become the norm. And anyway, anecdotal evidence and personal experience both indicate that the vast majority of items moved into PSTs remain there in splendid isolation and are never accessed subsequently, so they might as well be in an online archive.
Following high-profile cases such as the Sony hack, it’s easy to understand why companies are considering the use of PSTs. The need for organizations to comply with various legal and regulatory requirements has developed enormously in the last decade. PSTs were conceived at a time when the need for companies to be able to track, retain, and preserve email was not as great as it is now. If a company persists in allowing users to keep their PSTs, even to hold messages about long-gone subjects, it creates the potential of non-observance with a regulation (that might, for instance, require any communication relating to a topic to be kept for six years). Other potential problems include the inability to answer law suits from disgruntled employees or to maintain control over intellectual property by being unable to prove that an idea or concept originated at a certain time.
Apart from the desire to achieve better compliance, companies who invest in VDI technology often need to eliminate PSTs because personal files of this nature are usually incompatible with the kind of shared infrastructure used by VDI. Desktop refresh projects can also act as a catalyst for PST elimination, with the logic being that new hardware and new versions of Microsoft Office should be accompanied by new working habits – so no more PSTs!
The advantages of PST elimination
Running a project to eliminate PSTs from an organization often requires a lot of effort and time. Good project management is an absolute necessity as is strong executive leadership and direction. But organizations who succeed in eliminating PSTs gain many advantages from the work done by Microsoft over the last five years to add compliance and high availability features to their Exchange Online and Exchange on-premises offerings. Among those advantages are:
- The data is more secure because it is protected by Exchange high availability features rather than being subject to anything that might occur to a local hard disk.
- The data is indexed and discoverable by reliable server-based searches
- The data is available to compliance functionality such as in-place and litigation holds. If necessary, it's even possible to eliminate inappropriate or sensitive information from user mailboxes by using cmdlets like Search-Mailbox. Office 365 tenants can also use compliance searches to locate and extract mailbox data.
- Instead of being limited to a single PC, the data is available to desktop applications, web clients, and mobile devices
- If a user leaves the organization, their work data can easily be transferred to a colleague. In Office 365, you make the old mailbox inactive and then restore the content to a target mailbox. In on-premises Exchange, you can make the old mailbox shared, grant ownership to a new user, and allow them to move whatever data is necessary into their mailbox.
Recovering data from PSTs that is subject to regulatory oversight and putting it under the control of compliance features available for online mailboxes is something that CIOs and CEOs need to care about. As in the case of the Sony hack, data exposed through PST weaknesses can have catastrophic effects on a company’s reputation and partnerships, a fact that should also be of concern to senior executives.
Of course, these advantages can be perceived as a benefit for the organization rather than for end users and there’s truth in that observation. However, it’s also true that users benefit because they don’t have to manage PST data any longer. Everything is online and available to them no matter what device they care to use. Education about the reasons for PST capture and replacement is an essential part of any project that aims to import PST data into the online store.
After I wrote about PST management tools in June 2015, I was contacted by a number of people and asked to recommend tools to help them bring PSTs under control. Invariably, I start with Microsoft’s free PST Capture tool and then ask questions about how long they want the project to last, how effective they want it to be, the goals, and what budget exists. A combination of PST Capture and the Office 365 Import service will get PSTs assembled and ingested into online mailboxes (or archives) and can therefore be regarded as the baseline to compare other products against.
Remember that the Office 365 Import Service is still not generally available and might therefore not be a suitable choice for some companies, especially those outside the U.S. In addition, although Microsoft does not charge to use the Import Service today, they will after the service is generally available.
Those who plan to remain on-premises can use the ability of the Mailbox Replication Service (MRS) to process mailbox import requests and move PST data into online mailboxes and archives. This feature is supported in Exchange 2010, Exchange 2013, and Exchange 2016. Behind the scenes, the Office 365 Import Service uses MRS to process PSTs when they are ingested into the service.
All PST elimination projects require some investment in time. Microsoft’s free tools score highly because of their acquisition cost but are likely to require more hands-on time from administrators to extract satisfactory results, especially when you scale up the number of mailboxes to be processed and have to deal with PSTs that have been used since the last century. Investing in specialized software that automates and streamlines PST detection, file collection, fix-up of corrupt data, de-duplication, and ingestion to online mailboxes and archives is a good route to take if you have some funding available. You should ask vendors to provide copies of their tools and run the software against a representative set of PSTs to determine which product makes most sense in your environment.
Once a decision is made to eliminate PSTs and to begin a project to search for and remove the data held in these files to a more suitable repository, it’s important to put a Group Policy Object (GPO) setting in place to stop users creating and using more PSTs.
I find that support is often overlooked when assessing the quality of software products, so try and create a situation (like a horribly corrupt PST) and see how the vendors respond. It’ll be a good guide to how they might react if problems are encountered during your project. Add in a good helping of strong project management and you’ll be well on your way to success.
A Call to Action
PSTs are insecure, corruptible, and outdated. No justification exists for their use as general-purpose storage for user data. Let’s get this project moving and eliminate PSTs soon. The sooner the better in my eyes!
Follow Tony @12Knocksinna