DAG & Lagged Replication


With Exchange 2007 a new concept was introduced, that of lagged replication (or database copies if you want). This means that besides “immediate” replication (Cluster Continuous Replication or CCR) you could delay replaying logs on target databases (Standby Continuous Replication or SCR). This enabled creating solutions for resilience since targets could be several hours off. Also, and contrary to CCR that have a 1:1 relationship, SCR targets could have 1:N or N:1 relationships (one to many, many to one).

In addition to the replay lag time in Exchange 2007, you can also specify the truncation lag time which determines when log files will be truncated. This period starts after replaying the log files. Both parameters are limited to 7 days in Exchange 2007; the default values for SCR’s ReplayLagTime is 1 day and TruncationLagTime is 0.

With Exchange 2010 DAG, which is the successor to CCR/SCR, the maximum for ReplayLagTime and TruncationLagTime have been increased to 14 days. The default value for ReplayLagTime is 0, which mimics the behaviour of CCR.

Needless to say that you should have sufficient space to host the replication log files when you set it to to a value greater than 0, also depending on the transations within that time frame. In Exchange 2010 these lag times can be configured using the cmdlets Add-MailboxDatabaseCopy or Set-MailboxDatabaseCopy, e.g. to set the ReplayLagTime to 7 days for  (format Days.Hours:Minutes:Seconds) use:

Set-MailboxDatabaseCopy -identity DAG2\MBX1 -ReplayLagTime  7.0:0:0

Note that in Exchange 2010 these values can be configured on-the-fly; in Exchange 2007 you need to disable and re-enable SCR.

JBOD versus RAID


After the arrival of Exchange 2010 and its DAG feature many people suggested – Microsoft included – the option to run Exchange on low-cost SATA disks in JBOD (Just Bunch Of Disks) configuration, provided you have at least 3 database copies. As you probably know, DAGs enable you to have multiple copies of mailbox databases running on multiple servers with a configurable lag per copy.

This suggestion to use JBOD, as well as the discussion of going backupless or not, isn’t without controversy. For many years people have learned to put their (critical) data on redundant storage. With Exchange 2010 this dogma is said to have changed, because contrary to its predecessors, Exchange 2010 can happily run fully supported on low-cost SATA storage in JBOD configuration. The argument used is that because you have at least 3 copies on 3 different physical servers you can survive a single failed mailbox server (likely) but also two failed mailbox servers (unlikely?).

The first problem is underestimating the limits of relying on 3+ copies. Nobody expects the Spanish inquisition 3 failures, right? But what if your hardware (e.g. disks) are from a faulty batch or contain buggy firmware, and you equipped all three of your physical servers with those parts (something burn-in tests should bring to light, but who does that these days?). I know of several occassions where drives died in pairs within hours from each other; you better make haste recovering your mailbox then. Is this perhaps a reason to look at different vendors, different parts? After all, this is more or less the same reason many businesses require multi-vendor products when reliability & security is concerned, e.g. anti-virus products from different vendors. The idea is to spread the risk.

Also, being able to use JBOD (and go backupless) looks interesting on paper, but don’t forget that – as suggested – you need to get yourself at least 3 physical servers (and no, don’t run them virtually on the same host). So, in the end this may lead to less servers (with RAID) being the most cost-effective alternative when looking at the total picture, e.g. hardware, licenses (OS, Exchange, AS/AV agents, management software, etc.) and operational costs. Why run (and maintain) many servers when the additional costs don’t outweigh the benefits?

A third element of the JBOD versus RAID discussion is the time to recover the original situation. When one of the servers fails, you should rebuild the server (hope you have some spare parts lying around or a decent replacement service contract). And after rebuilding (or restoring) the server, you need to reseed the database copies. This step may take a long time, depending on the size of the databases. Replacing a harddisk and rebuilding RAID sets is much quicker and much easier (and less prone to error).

In the end – as always – the choice should be based on business requirements. Perhaps your business can do a few hours without e-mail while IT is recovering services (can’t imagine, but you never know). In that case it’s nice to have a supported low-cost JBOD/SATA option. In my opinion, the benefits of having proper RAID setup outweighs the trouble you have to go through when repairing your JBOD based solution. Depending on these requirements, and how deep your pockets are, I’d go for a combination of RAID and DAG, where RAID is used for availability and DAG for availability (same data center) or resilience (multi data center, i.e. disaster recovery).

Oh and one other thing: when you must, use proper “Enterprise class” SATA disks; they’re made to run 24×7.

Exchange 2010 Update Rollup 2


Today Microsoft released Update Rollup 2 for Microsoft Exchange Server 2010. RU2 comes 3 months after the release of RU1. The list of included fixes is not as long as with RU1, but RU2 does contain some important additional fixes over RU1:

  • 977633 Certain third-party IMAP4 clients cannot connect to Exchange Server 2003 mailboxes through an Exchange Server 2010 CAS server
  • 979431 The POP3 service crashes when a user connects to a mailbox through the POP3 protocol and the user is migrated from an Exchange Server 2003 server to an Exchange Server 2010 server
  • 979480 Users cannot receive new messages if they access mailboxes that are moved to another Exchange Server 2010 RU1 server by using IMAP4 clients
  • 979563 Exchange Server 2010 Push Notifications does not work
  • 979566 A 0x85010014 error is generated when linked mailbox users try to synchronize their mailboxes with mobile devices in a CAS-CAS proxying scenario in Exchange Server 2010
  • 980261 This fix introduces the supports for Exchange Server 2010 page patching when a “-1022” disk I/O error is generated
  • 980262 Event ID 2156 is logged on a computer that is running Exchange Server 2010

The related knowledgebase article (KB979611) can be found here. You can download Exchange 2010 Rollup 2 directly from here.

Microsoft Exchange Server Profile Analyzer


Today Microsoft released version 8.03.0056 of the Microsoft Exchange Server Profile Analyzer (EPA). You can use EPA to collect (statistical) information from Exchange mailbox stores or an Exchange organization. For instance, you can collection information on user activity and profile statistics for sets of subsets of the mailbox population, which can be used as  input when dimensioning (upgraded) Exchange infrastructure using the Exchange Mailbox Role Calculator (being more specific than the default Light, Average, Heavy and Very Heavy user profiles). The analysis of mailbox servers or organization can also provide input for processes like capacity planning.

Usage of EPA is straightforward, check out this older article on how to use EPA here.

You can download the 32-bit version here or the 64-bit version here.

ForeFront Identity Manager 2010 RTM


By now you’ve probably already heared ForeFront Identity Manager 2010 went RTM on March 2nd. FIM 2010 is the successor to ILM, the Identity Lifecycle Manager. FIM is an solution to manage identities and credentials in heterogeneous environments. It contains functionality for user (de)provisioning, password synchronization, group management, self-service and workflow-like applications. So for instance, FIM can enable organizations to automatically create an Active Directory user with an Exchange mailbox with all the proper settings when a new employee has been entered into the HRM system (or disabled or removed when the employee leaves the organization, depending on requirements).

You can download the trial here. More information on the FIM portal here.