Why Exchange Online now preserves BCC and DL information in message headers

If you browse the Office 365 roadmap, you might come across “Preserving DL membership and BCC for Exchange In-place hold” as one of the features listed that Microsoft is now rolling out and wonder what this means. Investigating further brings us to the TechNet page “Preserve Bcc and expanded distribution group recipients for eDiscovery”, where we learn that this feature is all about a change that Microsoft has made in how Exchange Online retains information about addressees in message headers. BCC and expanded distribution group data is now retained so that this information is indexed and therefore searchable through eDiscovery.

All of which sounds very good because you obviously want to be able to search for messages sent to particular individuals even if they were addressed through a distribution group or as a BCC. In the past, Exchange recorded BCC information in header of the message in the sender’s mailbox but never provided it to recipients as they might then be able to discover whether any BCC addressees had received copies. The transport service expanded distribution groups into sets of recipients but only the group names were kept in message headers.

The eDiscovery problem that then emerges is that you can’t rely on copies of messages to be full and complete accounts of who received information. The sender’s copy of a message might have been removed and a recipient copy is no proof that someone else received a copy as a BCC. The name of a distribution group is an indication that someone might have seen a message, but no record is kept of when that person joined a group, so you can’t prove that they received anything.

Deploying message journaling is the classic solution for on-premises deployments. This feature allows you to nominate a journal recipient to receive copies of messages generated on a database-wide or specific-user basis. The journaling agent, part of the transport service, executes journal rules to create journal reports containing copies of messages and send them to the nominated recipient. Because that recipient is identified by an SMTP address, it can be an Exchange mailbox or a mailbox on a totally different system, such as a specialized archiving system like HP Consolidated Archive or Symantec Enterprise Vault. Generally speaking, a specialized external system is a better choice for journaling because these systems are designed to handle the load of inbound messages better than an Exchange mailbox.

Journal reports are an important part of the solution because they contain a complete copy of messages, including all the header information like expanded distribution groups and BCC recipients. The completeness of the journal reports allow them to be regarded as satisfactory items for the purpose of legal discovery.

With various twists along the way, journaling has been around since Exchange 2000. But the issue for Exchange Online and the reason why Microsoft has had to introduce the new feature is that Office 365 does not allow you to nominate an Exchange mailbox as a journal recipient. Yes, journaling is supported, but you have to assign an SMTP address for the journal recipient that does not belong to an Office 365 domain. Microsoft’s logic in this respect is impeccable. A busy journal recipient might be called upon to process tens of thousands of messages daily. Allowing journaling to occur on a free basis across the multi-tenant Office 365 environment might create a black hole for resources. Hence the ban.

But banning Office 365 journal recipients creates another problem. What do companies who use journaling do if they want to move from an on-premises deployment to Office 365 and want to retain the ability to conduct discovery searches? Broadly speaking, the answers are:

Deploy a hybrid configuration and keep the journaling capability on-premises. This is a reasonable and intelligent solution, especially if you’re using a third-party archiving solution as the journal recipient. The big advantage is that you retain a well-known solution that will work for on-premises and cloud mailboxes; the downside is that you have to maintain on-premises servers. However, that downside is mitigated if you regard hybrid connectivity as a flexible option to retain choice should cloud services not work out for the company.
Attempt to work around the Office 365 restriction by moving your journal archive to Office 365 inactive mailboxes. Although this solves the problem of getting previous data into Office 365 in a form that the data is indexed and searchable, this is a complex and, in my mind, a pretty bad solution because you have to deal with multiple issues. First, you have to split up your existing journal archive (which might span many terabytes) into mailbox-size chunks (100GB). Perhaps you can split the archive on a month or quarter basis. Then you have to move the data from on-premises to the cloud, a process which isn’t usually fast. Finally, you have to come up with a solution as to what journal recipient is used on an ongoing basis. Remember, Office 365 doesn’t allow you to use an Office 365 mailbox as a journal recipient.

Which brings us back to the new feature. Because Exchange Online is now going to store BCC and expanded distribution group information in a hidden part of message headers, that information now becomes available for eDiscovery searches. Message size will increase especially to record the membership of large distribution groups, but who cares now that mailbox quotas are so large? For the record, a quick browse with MFCMAPI reveals that the expanded group information looks to be held in an attribute called GroupExpansionRecipients (edited version shown below) while the BCC data might be in an attribute called PR_DISPLAY_BCC_A.

On the surface, preserving data in message headers is a reasonable solution to the problem that addresses a known issue and might allow some companies to eliminate the need for message journaling. That is, once every recipient is moved to Office 365 – or you have transferred previously archived information to Office 365 in a satisfactory manner (a process the software vendor TransVault is attempting with their splendidly named Compliance TimeMachine initiative). Until that happy time occurs, there’s the small matter of what to do with previous journal archives that might have to be retained for many years, what to do with recipients who don’t have Exchange Online mailboxes, and what to do during the migration period.

Exchange Online now preserves BCC and group information in mail headers. Given its status as the next on-premises release, Exchange 2013 CU7 on-premises will likely do too. I don’t think that the update will be retrofitted to Exchange 2010.

Preservation of large journal archives might be only of interest to a small percentage of those who use Office 365, but if you’re in that select group it is a critical issue that needs to be solved. Microsoft's free Office 365 onboarding service is unlikely to be able to deal with complex compliance issues like this, so if you need to preserve journal archives, it's really time to talk to an Office 365 partner – this is exactly the kind of situation where they prove their worth.

Follow Tony @12Knocksinna

Comments

Plain text