Exchange and potential Packet Loss on VMWare

technical_support_outage_advisory[1]Yesterday, I noticed a VMware knowledgebase article, updated on November 14th, which could be worth taking notice of when you’re running Exchange – or any other application – in a virtualized environment based on VMware technology.

VMware’s KB article 2039495 mentions that in VMware ESXi 4.x and 5.x, very high traffic bursts may cause the VMXnet3 driver to start dropping packets in the Guest OS. This has been observed on Windows Server 2008 R2 running Exchange 2010 with – as VMware puts it – a high number of Exchange users. What the article fails to mention is the configuration used by customers experiencing the issue. It might for example be valuable to know if a DAG was used, if the traffic (MAPI, replication) was split over multiple NICs or if it occurred with iSCSI storage. I won’t be surprised if the issue occurs with other high traffic situations as well, e.g. seeding. Luckily, Exchange is capable of handling certain hiccups so customers might not be even aware of the issue.

After some more digging I found another article, KB 1010071, which mentions a packet drop issue with VMware Guests known since ESX 3. This article explains a bit more why the issue occurs in the first place, being the network driver running out of receive buffers, causing the packets to be dropped between the Virtual Switch and the Guest OS driver.

One could argue about the impact of a few lost packets. However, as traffic increases the (potential) number of lost packets increases. Each lost packet results in retransmission of unacknowledged packets, which impacts overall throughput causing increased latencies.

VMware’s temporary solution to this problem is:

  1. Open up the Windows guest;
  2. Open the properties of the VMXNET3 NIC;
  3. On the Advanced tab, increase the Small Rx Buffers or Rx Ring #1 Size;
  4. What KB1010071 mentions and KB2039495 doesn’t, is that when using jumbo frames – not seldom used, e.g. replication  – you might need to adjust the Rx Ring #2 size and Large Rx Buffers values.

Now I say temporary, because VMware’s solution of course isn’t  a real solution; it’s only meant to – in their own words – reduce packet drops. Also, the KB1010071 article states you should “determine an appropriate setting by experimenting with different buffer sizes”. That doesn’t sound like an permanent, assuring solution for a virtualization environment running business critical applications now, does it?

All things considered, I’d recommend configuring these parameters to their maximum setting, preferably at installation time, unless anyone knows of a reason not to. In addition, this is another case for the best practice to split MAPI and replication traffic on Exchange using multiple NICs.

Finally, I already learnt of two other applications experiencing the issue. Therefor I think the problem is not Exchange 2010 specific, as KB2039495 might imply. If you have similar experiences, experienced differences between GbE and 10Ge, please use the comments to share.

Microsoft Exchange Conference 2012, a Summary

After being absent for over 10 years, this year the most anticipated conference for Exchange minded people took place in Orlando, Florida (US), the Microsoft Exchange Conference 2012 (MEC).

Despite not being able to attend MEC 2012, I’d like to summarize the news on Exchange 2013 from the event. Some of this information went public as part of the release of Exchange 2013 Preview, which was released in July (yes, almost 2 months ago – time flies). Some statements were new, like for example the expected release date of Exchange 2010 SP3, which is required for co-existence with Exchange 2013.

With all the social media nowadays, you can track most of the statements made at the event. Thanks to people like Jeff Guillet and Devin Ganger and people from our The UC Architects group, like  Dave Stork, Michael van Horenbeeck, Pat Richard, Serkan Varoglu and John A. Cook, who reported live from the sessions they were attending (hastag #iammec), the community was kept up to date with information as it unfolded. At each the end of the day, Tony Redmond gave a nice summary including comments on the event as a whole.

Picture shows some of people behind The UC Architects together
with Perry Clarke (GM Exchange), who you might recognize from
the Ask Perry videos. The picture is taken by Tony Redmond.

The information presented here is a summary of all the information provided through social media and is additional to the information presented at the release of Exchange 2013 Preview; you can read all about that in my Changes in Exchange 2013 Preview article. It is in no way meant to be conclusive or complete.

Ok, now on to the goodness.

Co-Existence
Exchange 2010 Service Pack 3 is expected to be released in the first half of 2013. Not only is it required for co-existence with Exchange 2013, it also supports Windows Server 2012 as Operating System platform. Note that SP3 will require a schema update.

No word on the expected release date of the update required for Exchange 2007 to support co-existence between Exchange 2013 and Exchange 2007. Since Exchange 2007 SP3 Rollup 8 was released in August, thus after the Exchange 2013 Preview became available, I assume we have to wait for Rollup 9 (or 10?).

Storage
Ross Smith from the Exchange Team confirmed the 99% IOPS reduction claim when comparing Exchange 2013 with Exchange 2003; when compared with Exchange 2010 it’s a 50% reduction. That’s down from 1 IOPS per mailbox in Exchange 2003 to .125 IOPS in Exchange 2010 to a 0,0625 IOPS per mailbox in Exchange 2013.

image

Also, passive copies have around 50% reduction in IOPS, mainly due to the increased checkpoint depth (100MB) and less aggressive pre-reading of data to keep in line with the checkpoint depth (I’ll devote a separate article on this at a later date). This means when mixing active and passive copies on a Mailbox server, the passive copies play more nicely from a storage perspective. Also, because of these changes database fail-over times are down from 20 seconds in Exchange 2010 to about 10 seconds in Exchange 2013.

To validate storage for Exchange 2013, JetStress for Exchange 2013 will become available 3 months after Exchange 2013 goes RTM. When required to validate storage in the mean time, it is recommended to utilize Exchange 2010’s version of JetStress since Exchange 2010 and Exchange 2013 will have the same IO pattern.

Databases
In Exchange 2013, multiple databases per storage volume allowed, which allows for active and passive copies on the same volume. Looking at the lower IOPS requirements of Exchange 2013 ESE’s engine and the 50% lower IOPS factor of passive copies, this allows for some serious consolidation on large volumes. The number of volume copies must match the number of databases per copy.

Note that putting databases on SMB3 shares (Windows Server 2012) is not supported; putting a virtualized Exchange server on SMB3 shares is.

Mailboxes
Besides the recommendation to embrace 7,200 RPM disks for Exchange storage, large mailbox implementations are expected to take off (100GB+, including mailbox, archive and recoverable items) in an ongoing battle to get rid of PSTs and 3rd party solutions.

Due to database accounting changes in Exchange 2013, mailboxes may see a 30% increase in size when moved from Exchange 2010 to Exchange 2013. Make sure you adjust mailbox quota settings accordingly.

Client Access
CAS 2013 will proxy client traffic to Exchange 2010 using the CAS 2010 server’s FQDN, i.e. it won’t determine or use internalURL or InternalNLBBypassUrl. You can’t configure CAS-to-CAS proxying per site; it’s an all or nothing setting. At RTM, Exchange 2013 Client Access servers won’t contain support for SSL offloading.

Health Checking
Exchange 2013 will not only check the server’s health looking at the Exchange services, but it will also check the protocols.

CAS 2013 will determine the health of legacy Exchange servers using a simple HTTP HEAD call.

Automatic Reseeding
Besides the ability to seed databases using multiple sources, which prevents the situation where multiple remote copies are seeded over WAN links from the active copy, Exchange 2013 contains a feature called Automatic Database Reseeding or just AutoReseed.

AutoReseed can be utilized to automatically reseed databases when required, e.g. after a storage failure. AutoReseed can even allocate and initialize spare disks to restore database redundancy. AutoReseed requires configuring three new properties, which are part of the DAG:

  • AutoDagVolumesRootFolderPath refers to the mount point containing all available volumes, including spare volumes;
  • AutoDagDatabasesRootFolderPath refers to the mount point containing the databases;
  • AutoDagDatabaseCopiesPerVolume sets the number of databases copies per volume.

So for example, when you’ve configured a mount point C:\Volumes (AutoDagVolumesRootFolderPath) containing mount points for databases, e.g. C:\Volumes\DB1, and mount point C:\Databases (AutoDagDatabasesRootFolderPath) with mount points to Exchange databases, e.g. C:\Databases\DB1 (where C:\Databases\DB1 maps to C:\Volumes\DB1), and DB1 contains folders for database and logfiles, AutoReseed can utilize mount points from C:\Volumes to automatically recreate and reseed databases when DB1 fails.

Site Resilience
Exchange 2013 will feature an automatic site (datacenter) fail-over using a witness server located in a 3rd well-connected site. This enables customers to automate the process of site switchovers, from primary to secondary site. This feature is optional.

This may confuse existing Exchange customers, who perhaps learned with Exchange 2007 a 3rd site for the cluster voter was not recommended, after which it shortly became an option with Exchange 2010. Then, after a while an adjusted recommendation was published not to use a 3rd site and now it’s option again,

Despite this, I think this certainly is a valuable feature. Normally, site outages and datacenter switchovers are stressful situations; if it’s preconfigured and automated, the less prone to error the switchover process is.

Exchange fellow and colleague Jaap Wesselius, who did
2 sessions on Load Balancing Exchange, was interviewed
by F5. Click the image to watch the interview.

Exchange Online
You can use Exchange 2003 with Exchange 2013 Online (when it becomes available) by utilizing an Exchange 2010 CAS server, just like today.

Safety Net
Safety Net is the new transport dumpster in Exchange 2013 and will provide similar functionality. It will also take over the functionality of Shadow Redundancy, which purpose in Exchange 2010 is to guarantee delivery of messages and accommodate for transport failure. Lagged Copy functionality is also enhanced by Safety Net, since you can activate lagged copies by activating the (lagging) copy after which Exchange 2013 will use Safety Net to make the database current. How long Safety Net will hold messages is a configurable setting.

Compliance
Exchange 2013 will support Litigation Hold, Time-based Hold (rolling data, e.g. items aged X days) and In-place Hold (formerly known as Legal Hold).

Unified Messaging
The Exchange 2013 UM role has a 100 concurrent calls limit. As you probably know, in Exchange 2013 Mailbox servers are used for UM as well. Because of that, this limit will have serious consequences when you’re designing an environment using several big servers; you might be forced to distribute the workload over more, lighter servers.

Exchange 2013 and ForeFront Treat Management Gateway
Exchange 2013 will work fine in conjunction with ForeFront TMG, except for maps feature when using TMG’s Forms-Based Authentication (FBA); the only thing you need to adjust is the logoff URL. Note that despite the ForeFront TMG 2010 End-of-Life statement from Microsoft last week, people like Greg Taylor (Program Manager Exchange) emphasized customers shouldn’t avoid using or opting for TMG while it is still available.

Public Folders
Migration of Public Folders from Exchange 2007 or Exchange 2010 is a cut-over scenario, so there will be no co-existence.

When using Exchange 2013 Public Folders next to Public Folders on Exchange 2007 or Exchange 2010, you need to manually map those to related folders in Exchange 2013 using CSV file.

Emphasis was put on being able to control Public Folders and put that data in the same store is worth losing the multi-master functionality.

Exhibitor ENow Consulting held a contest
for collecting the most autographs.

Message Hygiene
Exchange 2013 will include tools to block messages in a certain character set. This is useful in scenarios where you don’t expect messages in one of the Chinese languages and you want to block (potential) spam written in one of those languages.

In-Place Archiving
The new term for Personal Archive or Online Archive is In-place Archiving.

Message Routing
Exchange 2013 won’t use least-cost routing when routing messages, but it will use it to determine if Hub sites are defined. Exchange 2013 will honor Hub site definitions, but there are to be considered legacy.

A Delivery Group is a set of transport servers responsible for delivering messages to a certain routing destination. There are several types of Delivery Groups, depending on the destination, e.g. DAG or Site. Each transport server is used in a Round-Robin fashion when delivering messages.

An MBX server and CAS server listen for incoming messages on port 25 unless co-located; then the MBX server will listen on port 2525.

More background information on message routing in Exchange 2013 also in conjunction with Exchange 2010 is to be found here.

Licensing
It is no longer required to have an Enterprise license for eDiscovery; it is still required to have an Enterprise license when using Legal Hold.

Virtualization
Many statements were made to de-emphasize virtualizing Exchange and only use if for testing purposes. When virtualizing, the same rules apply as for Exchange 2010.

Like with earlier versions of Exchange, the ESE engine will claim memory at startup using the amount of physical ram. Configuring Dynamic Memory is therefor not only pointless but also not recommended, like I stated in an earlier post on Exchange and Dynamic Memory.

It is also emphasized that putting VMDK files on VMWare NFS disks is not a supported scenario, so I assume this is often seen in the field despite not being supported from Microsoft.

Mobile
ActiveSync in Exchange 2013 will cause 65% less RPC communications over Exchange 2010.

Outlook Web Access
When using OWA 2013 in offline mode, the locally generated cache file isn’t secure; use of BitLocker is recommended. Single Sign-On in combination with OWA on Exchange 2013 redirection will be fixed post-RTM. Also, be advised that at RTM, OWA in Exchange 2013 won’t have support for Public Folders.

IAMMEC Portal
A portal for the Exchange community was announced, iammec.com. Here, people involved with Exchange can get information from within Microsoft or other sources. How this will differ from the Exchange related topics on TechNet forum is to be seen.

It is unknown if there will be a MEC in 2013; Microsoft’s director of PM for Exchange, Michael Atalla, said there will a MEC when “theres’s something  to talk about”. It is rumored that recordings of the 1st day of the conference will be made available at a later date, except for the interactive sessions.

PS: The icon accompanying this article is the Exchange 2013 logo.

TechEd North America 2012 sessions

With the TechEd North America 2012 event still running, recordings and slide decks of finished sessions are becoming available online. Here’s an overview of the Exchange-related sessions:


Thoughts on "VMware Zimbra vs Microsoft Exchange"

Note: This blog was written together with Dave Stork after reading a Zimbra and Exchange product comparison. You can find the article on Dave’s blog here, including a personal note by Dave.

In a blog post by Christopher Wells, alias vSamurai, the author positions VMware Zimbra Collaboration Server (ZCS 7.x) as an enterprise-ready drop-in replacement for Microsoft Exchange Server 2010 environments of all sizes. He also suggests Zimbra is a better multi-tenant solution for ISPs. The author does this by comparing both products in a feature comparison.

These reviews are helpful in order for companies to make an informed decision. After all, there’s nothing wrong with a bit of competition. However, Dave Stork and I wanted to create a response, because some statements are flawed or just plain wrong. In the process, we will be following the structure of the referenced blog:

Backup and Restore
The author starts off by claiming that “the ease with which backup and restore can be performed in Zimbra outweighs the capabilities of Exchange”. While it’s interesting to note the author implicitly admits Exchange is more capable, he misses the point. The product should follow a well-designed backup and recovery strategy, based on customer demands and compliance regulations. Where Exchange has server, database, mailbox and single item recovery options, Zimbra is built on top of MySQL, meaning recovery requires brick level restore or (partially) restoring information from MySQL dumps. Also, in Zimbra the databases only contains meta information; the actual messages and attachments are stored on the file system. While this makes sense for Zimbra, as many SQL people consider storing binary data in databases a bad practice, it increases the complexity of backup and restore, because meta information and file system needs to be in sync. Note that Exchange’s Extensible Storage Engine (ESE) is purpose-built for storing mailbox information, including attachments.

Scalability
Then, the author claims that Zimbra has better scaling capabilities than Exchange. First, let’s start by looking at the definition of scaling. A system is said to scale well if:

  • it can handle increased load without (serious) performance penalties, or
  • the system is able to accommodate growth by adding resources (scale up) or additional systems (scale out).

Ideally, scaling up should show a linear pattern, meaning two systems equal can handle twice the load. Scaling out most of the time doesn’t, which makes sense when looking at how computers are designed using shared resources like buses for example.

Now, scaling isn’t solely a matter of hardware; a system also requires software built to scale. The role-based model of Exchange, with its specific roles for serving mailboxes and handling replication, routing e-mail and servicing clients, is a good example of a thought-out scalability supporting concept. Of course, you can install all roles on a single server, which is currently the recommended practice by Microsoft, but you’re still able to design fit-for-purpose farms and clusters.

Thus, the ability to scale is determined by the whole set of components playing well together, hardware and software. With this in mind we’d like to include an interesting table which is part of the VMware (acquired Zimbra early 2010) study “Zimbra Collaboration, Server Performance on VMware vSphere 5.0”:

In their analysis, VMware primarily focuses on the CPU utilization figure. That figure implies that Zimbra has more headroom than Exchange using the same configuration. However, Exchange also has several background processes which perform tasks in the background, like optimizing the database to reduce the number of IOPS. Yes this takes up a certain % of CPU cycles, but optimizing storage for sequential access could explain the significant 240% decrease in IOPS for Exchange. Lower IOPS reduces storage requirements – and costs – for Exchange. The over 60% lower latency figure for Exchange is also an indication overall processing of messages is faster in Exchange.

Costs
As often in these Open Source Software (OSS) discussions, the cost card is played. The author claims that on average, Zimbra is 50% cheaper than Exchange. However, this claim is made without any supporting references or figures, making it difficult to verify this statement. However, from our experiences, those claims are often primarily based on retail prices and licensing costs. What is often overlooked (or ignored) in comparisons with OSS, are training costs or hidden costs like support or maintenance.

Functionality is also a potential cost saver, as companies can work more efficiently due to added or enhanced functionality. These savings depend on customer needs, although some are widely used and immediately contribute to lower costs, like for example AutoDiscover (automatic configuration of Outlook 2007 and later clients or ActiveSync devices).

Exchange natively supports Outlook, common browsers and mobile devices; Zimbra requires an Outlook plug-In, Zimbra Connector for Microsoft Outlook, increasing support and maintenance costs. Note that this connector is only available for Zimbra Collaboration Server Network Edition Professional users.

Regarding maintenance, Exchange requires Exchange, Active Directory and (optionally, but a big bonus) PowerShell skills. Zimbra consists of a set of 3rd party products, requiring knowledge of each product, like Postfix, mbox e-mail storage, MySQL, Apache. OpenLDAP, SpamAssassin, ClamAV and shell scripting. Of course, more components mean more products to configure and maintain, increasing maintenance costs.

Storage Benefits
A full paragraph is dedicated to the benefits of using Zimbra with NetApp storage. However, the NetApp products and technologies mentioned are not Zimbra specific, and therefor in our opinion do not add anything to the discussion.

Feature Comparison
The author then continues with a “direct” feature comparison between Zimbra and Exchange. Let’s have a look:

1. Platform Architecture
First, author claims ESE is over 20 years old, the .EDB file is non-modular and the ESE engine is non-tunable. Yes, ESE exists for over 20 years, but that’s also 20 years of experience in building a fit-for-purpose database engine. With each new Exchange version, ESE was redesigned to meet evolving requirements and expectations in a changing world. When looking at the VMware IOPS comparison in the Scalability section, it’s Zimbra that should worry about storage.

Second, author claims Database Availability Groups (DAGs), based on Fail-over Clustering, isn’t a proven technology for large deployments. Exchange 2010 is on the market since October 2009. Like many Exchange fellows, we have designed or seen large Exchange deployments (i.e. thousands of mailboxes). Also, if millions of Office 365 users aren’t proof of a successful large scale multi-tenant ISP-like deployment based using multiple data center DAGs, what is?

To be honest, is it really that important which exact technology is used and how old it is? In the end functionality and performance are more important, as they are relevant in any business case for Exchange. What would a decision maker most likely ask, “Does it use Microsoft SQL Server?” or “What can we do with it and how much will it cost?”. We think and know out of experience it will probably be the latter.

2. Reliability & Robustness
The author claims Microsoft is considering (moving Exchange storage to) SQL and needs to prove robustness of the new architecture. While Microsoft has considered the SQL storage engine several times, it decided to stick with the optimized ESE engine. This was also true for Exchange 2010 back in 2009, like you can read in this blog. Main reason for deciding to stick with ESE is performance.

When pleading for ZCS, the author states “Linux has better uptime”. While this may have been true in the Windows 98 era, from experience, managed Exchange systems can reach similar uptime figures. On the contrary, I’ve seen Linux systems crashing every few days. The only conclusion you can draw here is that reliability not only depends on hardware and software components and their quality, it also depends a lot on if and how systems are managed. Also, don’t confuse uptime with availability, as planned downtime will reset my uptime statistic, but that’s all it is: a statistic.

3. Tiered Storage (was Platform Scalability)
Tiered storage, or Hierarchical Storage Management, is about classifying data in terms of things like security, performance or pricing. Exchange itself partly supports this concept, using elements like DAGs, databases, mailboxes, personal archives and retention policies. For example, you can home your mailbox on multiple lean and mean servers using fast SAS storage while personal archives, used to automatically store e-mail older than 1 year using retention policies, are served by a fat server using inexpensive SATA disks on JBOD storage.

ZCS utilizes a built-in HSM solution which automatically moves items from the (fast) primary volume to the (cheaper) secondary volume. The database holds information on the actual location where the item resides. Conceptually, this matches the Exchange concept of primary mailbox and personal archive using retention policies. However, retention policies are more powerful and – when permitted – give users control over what to archive and when. When Exchange customers want to use a deeper level of storage tiering, they can opt for 3rd party solutions like Symantec Enterprise Vault (item-level stubbing) or storage solutions.

Note however, there are some important factors to take into consideration with stubbing:

  • Data stored on a different tier, e.g. tape, isn’t always available online;
  • Tiered storage adds complexity, introducing the need to compare reduced costs for storage against additional costs due to increased complexity;
  • Stubbing may impact future migration or transition options, e.g. vendor support, or recovery options.

4. High Availability
Author claims DAGs do not provide Exchange infrastructure protection and have a learning curve. The first part of that claim is absolutely true: DAGs are designed to increase the availability of Exchange databases served by Exchange servers holding the Mailbox role, while providing a fail‑over mechanism. Covering for the other tasks are the other Exchange roles. Mail flow within an Exchange Environment is automatically redundant when you have multiple Hub Transport servers, as they monitor connectivity and possible routes for delivery. For client access, multiple Client Access servers can be made redundant using load balancing technology. Exchange has these built-in features that work independent of where Exchange is running, i.e. they also work in a non-virtualized system and no additional high priced product is required to make the underlying services highly available.

Regarding the learning curve claim, every new technology has a learning curve. DAG is built on top of fail-over clustering (nothing new) and easier to manage than its predecessors, CCR and SCR. Then again, we’d prefer Exchange admins who know what they’re doing, rather than somebody who learned an SRM trick.

Speaking of which, the whole argument that “ZCS with VMware’s Site Recovery Manager (SRM) is proven, scalable and effective” is apparently nothing more than a plug for VMware’s SRM product in conjunction with VMware licenses (vSphere required), as we see no credible arguments.

5. Platform Extensibility
The author states that Microsoft recommends using its proprietary shell. We assume he means PowerShell, which is here to stay. Other vendors, like Cisco or Quest, are adopting it and offer modules to manage their products using PowerShell. Heck, even Zimbra offers PowerShell scripts to manage Zimbra through encapsulated SOAP requests. For the record, we both don’t know of any Exchange admin complaining about some Linux product requiring bash (Bourne-Again shell) or perl for scripting, turning this in a non-argument.

The author continues by apparently mixing a few things up. The argument given for ZCS is that “SOAP API allows server access using web services framework for client access and Zimlets for integration with 3rd-party services” while Exchange offers “limited SOAP access” and “Outlook add-ins require developer effort”. This is apples versus oranges; Outlook is a fat client and Zimlets are like web parts. If you want to make a nice dashboard, we’d suggest you use something like Sharepoint instead of bloating your e-mail web client.

Finally, SOAP and Exchange Web Services (EWS) are targeted at developers, PowerShell at automation. If you’re curious about the power of EWS, we’d suggest you check out the excellent blog by Glen Scales.

6. Platform Openness
While Exchange is mostly closed source, a lot has changed since the 90’s. Exchange has a developer center nowadays, where SDK and APIs are published on how to interact with certain parts of the Exchange ecosystem, e.g.:

7. Open Standard Protocols Support
It’s true that the current Outlook version doesn’t support all available standards for exchanging calendaring or contact information. However, for most companies that isn’t an issue. When required, solutions and workarounds are available.
Also see “Mobile Support”.

8. Rebranding
The author claims Outlook Web Access (OWA) has a single theme. That might have been the case with the RTM version, but since SP1 we have over 28 themes to choose from. If that’s not enough, there’s even an Exchange Server 2010 SP1 Outlook Web App Customization SDK to take customization into your own hands. Note that the SDK also documents integrating IM (e.g. Lync).

9. Web Client Support
Regarding Web Client support, the author states “limited browser support for OWA” (Outlook Web App). Since SP1, OWA has full support for IE7+, Firefox 3.01+ (Windows, MacOS, Linux), Chrome 3.0.195.27+ (Windows), Safari 3.1+ (MacOS). In addition, OWA Mini, targeted at simple mobile browsers, reincarnated in Exchange 2010 SP2.

Yes, there are browsers out there that don’t have the full featured Premium OWA (like Opera), but “limited browser support for OWA” is a bit over-simplified, especially if you take into consideration the combined market shares of the fully supported browsers (without Safari, between 81-91% since December 2011).

10. Mac Support
Outlook team and Mac Outlook are produced by two different teams, which might be one of the reasons for the feature disparity between Outlook 2010 and Outlook for Mac 2011. Apart from differences caused by the underlying operating system, we agree features should be as on par as possible for all available platforms.

Note that the mentioned Zimbra desktop client doesn’t support Exchange’s native MAPI protocol, adding the requirement to enable the IMAP or POP protocol on the Exchange server.

11. Linux
The author proceeds by arguing there’s no Outlook client or Exchange Server for Linux. That is a moot point; there’s also no Zimbra server for Windows. Also, when somebody’s trying to convince you using arguments like, “ZCS server components love the Linux platforms”, that’s not very convincing now, and is often seen with discussions when emotions prevail over rational thinking.

12. Mobile Support
More and more (mobile) clients are adopting the Exchange ActiveSync (EAS) protocol for exchanging e-mail, calendar, contact and task information with Exchange. In fact, even Blackberry announced they will adopt EAS in their upcoming Blackberry 10 OS product. This is probably driven by Microsoft releasing EAS protocol as part of their Open Specifications Promise, turning EAS more or less into the de‑facto standard for (corporate) e-mail synchronization for mobile clients.

Zimbra partially supports EAS for e-mail, calendar and contacts, but requires the Zimbra Mobile add-on. It is a bit unclear if tasks are synced, here it seems so for Pro users but here it is advised against while here the screenshots tell yet another story. Confusing.

13. Multi-tenancy
The author doesn’t show how Zimbra is a better multi-tenancy solution for ISPs when compared to Exchange 2010. But since Exchange 2010 Service Pack 2, there is no need for third party hosting software as it is now fully incorporated in Exchange without extra costs.
However; the intent was possibly to prove this implicitly via the costs argument of on-premises deployments. One other way is to look at actual hosted Zimbra and Exchange solutions available commercially.

Let’s compare costs from random Zimbra providers (picked from Zimbra’s Partners list), Exchange hosting providers and Office 365 subscriptions. It is not an extensive comparison, but it should give us an indication. Some (not all) are shown here:

Product MrMail Professional Zimbra Mailbox CVM Zimbra Professional Suite PayPerCloud Hosted Exchange Professional Office 365 Exchange Online Office 365 Plan E1
Storage 8GB 1GB 25GB 25GB 25GB
(mailbox, sharepoint is separate and additional)
Own mail domain Yes Yes Yes Yes Yes
Attachment size 20MB ? ? 25MB 25MB
Web Access yes yes yes yes yes
POP / IMAP yes/yes yes/yes yes/yes yes/yes yes/yes
ActiveSync yes yes yes yes yes
Antimalware yes yes yes yes yes
SharePoint or similar yes yes no no yes
Lync IM/Presence no no no no yes
Price per user per month $8.61* $7* $7.95** $4** $8**

*) discounts possible with more mailboxes
**) Note that prices are per month, but only apply with an annual subscription.

This table shows that the Exchange subscriptions are comparable or provide more functionality for lower costs. We do not see the 50% cost benefit argument at all and in our opinion shows that Exchange 2010 is a very viable multi-tenancy solution for ISPs.

One very important difference we want to point out is the available storage per mailbox. This tended to be a lot (several factors) more with Exchange than with Zimbra, without heavily impacting the price. This fact alone suggests that Exchange can be a very viable groupware solution to ISPs.

Final words
This concludes the authors’ feature comparison, but there are still some important elements missing, like product support, directory integration, IPv6 readiness, traffic management (e.g. ethical walls) or IRM. Also, what about integration or support of Unified Communications technologies, like single inbox – including voicemail – or voice access to mailbox?

Now don’t get the impression we want to condemn Christopher for trying to compare both products, even though by reading just the header and counting the numerous VMware-related logos on the site we were a bit hesitant regarding what the “conclusion” would be (we have a saying here, We from WC Eend recommend WC Eend).

We do appreciate good comparisons, because it can shake up our opinions of what is and what should be with Exchange and start interesting discussions. It‘s also an opportunity to learn about similar products. We believe competition is healthy and comparisons can be educational; It can help companies make a better fit for their needs and budget, or at least provide a starting point.

It is however crucial for a fair comparison that the facts, conclusions and opinions stated are correct and sound. Unfortunately, this is not the case with this article. There are numorous factual errors and most opinions stated are poorly argumented. To add to that, the author uses a feature list which can be found on the internet in several places, like here. This may be an indication authors are copying content, without knowledge or cross-checking facts.

Therefore, with the information provided in Christophers blogpost, one can’t conclude that Zimbra is an adequate replacement for all environments, Enterprise or SMB. Also, we do not see any indication that Zimbra is better suited for multi-tenancy by ISPs. If anything, we think we have shown that Exchange is a more than capable, competitive and well-though product.

You’re invited to comment or share your opinions in the comments below.

Update (April 10th): Apparently, on March 21st Wells posted a follow up on his Zimbra versus Exchange viewpoint. Looking at it, Wells seems to enjoy the attention.  Despite saying discussing viewpoints keeps vendors’ focus sharp, he doesn’t come up with arguments on why our post was – in Wells’ words – flawed. While I believe Zimbra serves a purpose – and it certainly isn’t on my radar as Wells says – I feel Zimbra or other non-Exchange evangelists should be able to take feedback like a pro. When you ignore other viewpoints or remain silent when asked for arguments, it’s more like a monologue rather than the interaction Wells claimed he’s in favour of.

Finally, our post didn’t go unnoticed, as Tony Redmond referred to it an article on Windows IT Pro. In the article, called Dispelling myths and other half truths, Redmond addresses some of Wells’ flawed claims as well.

TechEd North America 2011 sessions

With the end of TechEd NA 2011, so ends a week of interesting sessions. Here’s a quick overview of recorded Exchange-related sessions for your enjoyment:

Virtualized Exchange 2010 SP1 UM, DAGs & Live Migration support

Today, just before TechNet North America 2011, Microsoft published a whitepaper on virtualizing Exchange 2010, “Best Practices for Virtualizing Exchange Server 2010 with Windows Server® 2008 R2 Hyper V“.

There are some interesting statements in this document which I’d like to share with you, also after the Exchange team published an article on supported scenarios regarding virtualized shortly shortly after this paper was published.

First, as Exchange fellow Steve Goodman blogged about, a virtualized Exchange 2010 SP1 UM server role is now supported, albeit under certain conditions. More information on this at Steve’s blog here.

The second thing is that live migration, or any form of live migration offered by products validated through the Windows Server Virtualization Program (SVVP) program, is now supported for Exchange 2010 SP1 Database Availability Groups. Until recently, the support statement for DAGs and virtualization was:

“Microsoft does not support combining Exchange high availability (DAGs) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers. DAGs are supported in hardware virtualization environments provided that the virtualization environment doesn’t employ clustered root servers, or the clustered root servers have been configured to never failover or automatically move mailbox servers that are members of a DAG to another root server.”

The Microsoft document on virtualizing Exchange Server 2010 states the following on page 29:

“Exchange server virtual machines, including Exchange Mailbox virtual machines that are part of a Database Availability Group (DAG), can be combined with host-based failover clustering and migration technology as long as the virtual machines are configured such that they will not save and restore state on disk when moved or taken offline. All failover activity must result in a cold start when the virtual machine is activated on the target node. All planned migration must either result in shut down and a cold start or an online migration that utilizes a technology such as Hyper-V live migration.”

The first option, shutdown and cold start,  is what Microsoft used to recommend for DAGs in VMWare HA/DRS configurations, i.e. perform an “online migration” (e.g. vMotion) of a shut down virtual machine. I blogged about this some weeks ago here since VMWare wasn’t always clear about this. Depending on your configuration, this might not be a satisfying solution when availability is a concern.

The online migration statement is new as well as the host-based fail-over clustering. In addition, though the paper is aimed at virtualization solutions based on Hyper-V R2, the Exchange Team article is more clear on supported scenarios for Exchange 2010 SP1 with regards to 3rd party products (VMware HA); if the product is supported through the SVVP program, usage of Exchange DAGs are supported. Great news for environments running or considering virtualizing their Exchange components.

Be advised that in addition to the Exchange team article, the paper states the following additional requirements and recommendations as best practice:

  • Exchange Server 2010 SP1;
  • Use Cluster Shared Volumes (CSV) to minimize offline time;
  • The DAG node will be evicted when offline time exceeds 5 seconds. If required, increase the heartbeat timeout to maximum 10 seconds;
  • Implementation of latest patches for the hypervisor;
  • For live migration network:
    - Enable jumbo frames and make sure network components support it;
    - Change receive buffers to 8192;
    - Maximize bandwidth.

Note that on May 17th the DAG support statement for Exchange 2010 SP1 on TechNet was updated to reflect this. However, the last two sentences might restart those “are we supported” discussions again:

“Hypervisor migration of virtual machines is supported by the hypervisor vendor; therefore, you must ensure that your hypervisor vendor has tested and supports migration of Exchange virtual machines. Microsoft supports Hyper-V Live Migration of these virtual machines.”

So, if vendor A, e.g. VMWare, has tested and supports vMotioning DAGs with their hypervisor X, Microsoft will support Live Migration for virtual machines on hypervizor X using Hyper-V? Now what kind of statement is that?

(Updates: May 16th – statements from EHLO blog, May 17th – mention updated TechNet article)

VMWare HA/DRS and Exchange DAG support

Last year an (online) discussion took place between VMWare and Microsoft on the supportability of Exchange 2010 Database Availability Groups in combination with VMWare’s High Availability options. Start of this discussion were the Exchange 2010 on VMWare Best Practices Guide and Availability and Recovery Options documents published by VMWare. In the Options document, VMWare used VMware HA with DAG as an example and contains a small note on the support issue. In the Best Practices Guide, you have to turn to page 64 to read in a side note, “VMware does not currently support VMware VMotion or VMware DRS for Microsoft Cluster nodes; however, a cold migration is possible after the guest OS is shut down properly.” Much confusion rose; was Exchange 2010 DAG supported in combination with those VMWare options or not?

In a reaction, Microsoft clarified their support stance on the situation by this post on the Exchange Team blog. This post reads, “Microsoft does not support combining Exchange high availability (DAGs) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers.” This meant you were on your own when you performed fail/switch-overs in an Exchange 2010 DAG in combination with VMWare VMotion or DRS.

You might think VMWare would be more careful when publing these kinds of support statements. Well, to my surprise VMWare published a support article 1037959  this week on “Microsoft Clustering on VMware vSphere: Guidelines for Supported Configurations”. The support table states a “Yes” (i.e. is supported) for Exchange 2010 DAG in combination with VMWare HA and DRS. No word on the restrictions which apply to those combination, despite the reference to the Best Practices Guide. Only a footnote for HA, which refers to the ability to group guests together on a VMWare host.

I wonder how many people just look at that table, skip those guides (or overlook the small notes on the support issue) and think they will run a supported configuration.

Exchange & Dynamic Memory : Don’t

With the arrival of Service Pack 1 for Windows Server 2008 R2, Dynamic Memory was introduced. In brief, Dynamic Memory is a memory management enhancement for Hyper-V which allows running virtual machines (VM) to allocate memory from the host and releasing it when possible, giving a minimum and maximum memory boundary. The main benefit is a higher VM density, because each VM will only allocate what’s required and you don’t have to approximate memory allocations.

Now this mechanism works well for many applications, but not for Exchange. Exchange’s goal – at least that of servers holding the mailbox role – is to claim as much memory as possible in order to cache information. This amount depends on the installed of memory (more information here). This cache is used for performance reasons, more cache means less I/O’s, less I/O’s result in better performance. You can guess what happens when you run Exchange with a minimal amount of memory and lots of dynamic memory configured, optionally shared with other Dynamic Memory-enabled VM’s. If Exchange starts up and wants to claim memory for caching or allocate memory for other reasons (transactions), instead of the memory being available instantly the host first needs to allocate it, or worse have other VM’s surrendering it their memory. That doesn’t make sense and will result in significant performance penalty.

Besides it being pointless to configure Dynamic Memory for Exchange, it’s also not recommended. From the Exchange 2010 System Requirements:

Many of the performance gains in recent versions of Exchange, especially those related to reduction in I/O, are based on highly efficient usage of large amounts of memory. When that memory is no longer available, the expected performance of the system can’t be achieved. For this reason, memory oversubscription or dynamic adjustment of virtual machine memory should be disabled for production Exchange servers.

Also, from the paper Implementing and Configuring Dynamic Memory, on applications that may not perform as well after Dynamic Memory is enabled:

  • Applications that perform their own memory management by taking over certain aspects of memory management from the operating system. Such applications typically grab as much memory as they possibly can in order to ensure the application’s best performance which can cause the amount of memory allocated to their virtual machine to grow until it reaches the amount specified by the Maximum RAM setting;
  • Applications where memory allocation is a one shot operation that is performed either when the application starts for the first time or each time the application starts.

Concluding, yes you can use Dynamic Memory for your lab or testing environment and it works. But don’t use it in production for Exchange Server.

Credits to Jetze who blogged about this originally here (Dutch).

A Decade in High Availability

A recent post from Elden Christensen, Sr. Program Manager Lead for Clustering & High Availability, reminded me of one of my former employers. When I joined that company back in 2000 for starting up a professional services based on Windows Server 2000 Data Center Edition, the company was already an established professional services provider in the business critical computing niche market, e.g. Tandem/Compaq/HP NonStop systems, mostly used in the financial markets, e.g. banks or stock exchanges. The Windows platform was regarded as inferior at that time by the NonStop folks and they had good arguments back then.

Remember, those were also the early days where no one was surprised to see an occasional blue screen (people were also using Windows 9x) and what we now know as virtualization was already happening on mainframes in the form of partitioning. At that time, Microsoft with their Windows Server platform had ambitions to enter the data center environment, where the NonStop platform was an established platform for ages and professionals had developed best practices for those environments.

Another part of the discussion was the Fault-Tolerance  versus High Availability topic, where NonStop was already an established Fault Tolerant solution for business critical environments, Windows still had only ambitions to move towards that market with the Data Center product. A logical move, looking at the status of (web) applications, SQL and last but not least, Exchange and where it was going and what customers expected of those products regarding availability and reliability. To repeat an infamous quote of a NonStop colleague back then, “E-mail is not business critical”. But that was almost 10 years ago, things have changed .. or haven’t they?

Single Point of Failure
First I’ll start by introducing the availability concept, which revolves around eliminating the single point of failure. This is an element in the whole system of hardware, software and organization that can cause downtime for a system, i.e. disruption of services. After identifying a single point of failure, we want to eliminate it to prevent downtime which is, after all, the ultimate goal for a business critical system. We can approach this task using two different strategies, Fault Tolerant (FT) or High Availability (HA). The task of identifying and eliminating single points of failure is an ongoing process, as most IT environments are subject to change over time.

Availability
To understand the Fault Tolerant and High Availability strategies we need to define the term “Availability”. In the dictionary, availability is defined as the quality or state of being available or an available person or thing, where in both cases available means present or ready for immediate use. The availability is mostly expressed as a percentage, for example when used in a service level agreement, but what does that percentage mean? To explain this take a look at the following diagram:

Lifecycle

I assume this lifecycle speaks for itself. Using this diagram, the availability is calculated as follows: MTBF / (MTBF + MTTR). The related expected downtime is calculated as ( 1 – Availability% ) * 1 year. Note that the time between failure and recovery isn’t used in the calculation.

I’ll use a simple example, a 500 GB Seagate Barracuda 7200.12 (ST3500412AS) with a MTBF specification of 750,000 hours. You have a 24 hours replacement contract and need about 4 hours to restore the backup. The availability would then be 750,000 / ( 750,000 + 28 ) = 0.9999626680% resulting in a yearly downtime of ( 1 – 0.9999626680) * ( 365 days  * 24 hours * 60 minutes ) = 19,6 hours.

Of course with hardware these numbers are theoretical and to some extent a marketing thing; how else can Seagate specify an MTBF of 750,000 hours ( 85 years ). I tend to look at it as an indication of the reliability you can expect. For example, compare the MTBF of 7200.12 drive with an enterprise class drive, Seagate’s ES product line. The ST3500320NS has an MTBF of 1,200,000 hours.

That’s the reason you should use enterprise class drives in your storage solution instead of desktop drives, which aren’t supposed to run in 24×7 environments. To add to that, the MTBF decreases when used in series (RAID 0 = 1 / (1/MTBF1 + .. 1/MTBFn)) or increases when used in parallel (RAID 1 = MTBF * ( 1 + 1/2 + 1/n) ) configurations. When trying to do calculations for the whole supply chain, with all the elements and their individual specifications and support contracts, this can get very complex.

The 9’s
imageWhen talking about availability this is often shown using a series of 9’s, e.g. 99.9%. The more 9’s it has, the better (less downtime). Note that for each increased level of availability, the required effort increases significantly. By effort, don’t think of technical solutions only. It also means organizational measures like having skilled personnel and proper procedures.

A fact is that only a small percentage of the causes for outage is technical, the majority of incidents is due to human error. And yes, that includes that bad driver which is programmed by humans. This is why changes in properly managed infrastructure should always go through test and acceptance procedures in environments representative or identical to the production environment. Unfortunately, this doesn’t always happen as not all IT departments have this luxury, mostly because of financial reasons.

Availability% Downtime / Year Downtime / Month
99.0% 3.65 days 7.3 hrs
99.9% 8.76 hrs 43.8 min
99.99% 52 min 4.3 min
99.999% 5.2 min 26 sec
99.9999% 31 sec 2.6 sec

Fault Tolerant
imageThe goal of a Fault Tolerant solution is to maximize the Mean Time Between Failure (MTBF). This is achieved by mirroring or replicating systems. These monolithic systems run software in parallel on identical hardware. This is called Lockstep (which, for your information, refers to synchronized marching).

Because Fault Tolerant systems run in parallel, the results of an operation can be compared. When the results don’t match, a fault occurs. Since the faulty system can’t be identified using 2 parallel systems, there’s also a variation to this architecture where one server functions as master and one as slave, the slave functioning as a hot-standby. To solve the ambiguity, you could use three systems where the majority of the systems determine the right output.

When faults are detected in a Fault Tolerant system, the failing component (or system) is disabled and the mirror takes over. This makes the experience transparent for the end-user. There is one caveat: since Fault Tolerant systems run software in parallel, software faults are also mirrored.

Examples of Fault Tolerant components are ECC RAM, multiple NICs in Fault Tolerant configuration, multipath network software, RAID 1+ disk systems or storage with replication technology. Examples of Fault Tolerant systems are HP NonStop (propriety), Stratus ftServer or Unisys ES7000. There are also software-based solutions like Marathon EverRun or VMWare’s FT offering.

High Availability
High Availability aims to maximize minimize the MTTR. This can be achieved by redundant or standby (cold, hot) systems or non-technical measures like on-site support contracts. Systems take over the functionality of the failing system after the failure occurred. Therefor, High Availability solutions aren’t always completely transparent for the user. The effects of a failing system and the consequences for end end-user depend on the software, e.g. a seamless reconnect or requirement to login again. Another point of attention is the potential loss of information caused by pending transactions being lost because of the failure. To make the experience more transparent for the user, application need to be resilient, e.g. detecting failure and retrying the transaction.

Examples of High Availability technologies are load balancing – software or hardware-based – and replication, where load balancing is used for static data and replication for dynamic data.

The Present
After a decade, technology has evolved but is still founded on old concepts. Network load balancing is still here and clustering (anyone remember Wolfpack?), although we moved from shared nothing to to replication technology, remains largely unchanged. This means either there hasn’t been much innovation or the technologies do a decent job; After all, it’s still a matter of demand and supply. Yes, we moved from certified configurations-only shared storage solutions to flexible Database Availability Groups (hey, this is still and Exchange blog), but most changes are in the added functionality category or to take away constraints, e.g. cluster modes (majority node set, etc.), multiple replicas and configurable replication.

Windows Server
Data Center Edition
x86 x64
2000 Max. 32 GB
32 CPUs
4 nodes
N/A
2003 SP2 Max. 128 GB
32 CPUs
8 nodes
Max. 512 GB
64 CPUs
8 nodes
2003 R2 SP2 Max. 64 GB
32 CPUs
8 nodes
Max. 2 TB
64 CPUs
8 nodes
2008 Max.  64 GB
32 CPUs
16 nodes
Max. 2 TB
64 CPUs
16 nodes
2008 R2 N/A Max. 2 TB
64 CPUs (256 logical)
16 nodes

What about Fault Tolerance and Windows’ Data Center Edition as the panacee for all your customers requiring “maximum uptime”? The issue with Fault Tolerant was that it came with a hefty price tag, especially in those days. Costs were an x-fold of the costs involved with High Availability solutions on decent (read: stable) hardware. So, for those extra 9’s you needed deep pockets. For example, around 2001 an Compaq ES7000 with Windows Server 2000 Data Center Edition, the joint-support queue (e.g. Microsoft and OEM) and services came with a $2m price tag for which you got the promise of 99,9% availability.

Compare that to buying a few Proliant’s with Windows Server 2000 Advanced Server, some Fault Tolerant components (FT NICs, RAID), off the shelf High Available technology and dedicated personnel (justifiable with that DCE price tag) for .. say, $250,000. With skilled personnel and operated in a controlled environment you could easily reach 99% availability. Is that price difference worth 3 days of downtime? Also, the simplicity to implement those technologies made High Availability in Windows accessible for the masses and now – certainly in the Exchange world – seldom see load balancing or forms of clustering not being utilized.

Note that in the past decade, I’ve never encountered Data Center for hosting Exchange. In fact, as of Exchange 2003, support for on Data Center was dropped. Nowadays, Data Center is regarded as an attractive option for large-scale virtualizations based on Hyper-V, not only because Data Center costs less than back then (about $3000 per CPU – hurray for multi core, but with a 2 CPU minimum) and runs certified on more hardware, but also because it comes with unlimited virtualization rights, meaning you may run Windows Server 2008 R2 (or previous version) Standard, Enterprise, and Datacenter in the virtual instances without the need to purchase additional licenses for those.

With all the large-scale virtualization and consolidation projects going on, virtualizing Exchange or other parts of your IT infrastructure, it’s good to know that there are other options when required by the business.

W2008 R2 Hyper-V architecture poster

Since most of us working with Exchange have – directly or indirectly – to do with virtualization technologies, we would like to inform you Microsoft published a Windows Server 2008 R2 Hyper-V architecture poster, much like they did for Exchange 2007 (no sign of Exchange 2010 versions yet).

This poster – in PDF format – is a nice reference for key Windows Server 2008 R2 Hyper-V technologies. It focuses on subjects like networking, storage, live migration, snapshotting and management.

You can download the PDF here.