Exchange and VMWare Guest Introspection

Ex2013 LogoIn this long overdue article, I would like to share an experience, where a customer was upgrading from Exchange 2010 to Exchange 2013. Note that this could also apply to customers migrating from Exchange 2007 or migrating to Exchange 2016 as well. The Exchange 2013 servers were hosted on VMWare vSphere 5.5U2; the Exchange 2010 servers on a previous product level.

The customer saw a negative impact on the end user experience of Outlook 2010 users, especially those working in Online Mode. Other web-based services like Exchange Web Services (EWS) were affected as well. The OWA experience was good.

Symptoms
After migrating end user mailboxes from Exchange 2010 to Exchange 2013 (but as indicated, this applies to Exchange 2016 as well), end users reported delays in their Outlook client responses, where sometimes Outlook seemed to ‘hang’ when performing certain actions like accessing a Shared Mailbox. Also, when opening up the meeting planner in order to schedule a room using Scheduling Assistant, it could take a significant amount of time, (i.e. minutes) before the schedule of all the rooms was being displayed.

The end users’ primary mailbox was configured to use Cached Mode, except for VDI users who used their primary mailbox in Online Mode. Shared Mailboxes were used in Online Mode due to the size (Outlook 2010, so no slider).

Analysis
First, the overall health of the Exchange environment was checked to exclude it as a potential cause. Exchange performance metrics were monitored, as well as Managed Availability status and events, logs like the RCA logs, and VMWare CPU Ready % to check for potential vCPU allocation issues (read: oversubscription). None of these metrics caused any reason for concern.

After reconfiguring the HOSTS file, in order to bypass the load balancer and direct traffic to a single Exchange server to simplify troubleshooting, the symptoms remained. Then, we checked:

  • TCP/IP optimization settings, e.g. RSS, Chimney, etc.
  • VMWare VMXNet3 offloading, e.g. Large Send Offload, TCP Checksum Offloading
  • VMWare VMXNet3 buffer settings

All those settings were also found to be on their recommended values.

We started digging in from the client’s perspective, and used WireShark to see what was going on on the wire. After filtering on the Exchange host, we saw the following pattern:

image

Note that this customer used SSL Offloading, so mailbox access took place on port 80 instead of 443 (RPC/http).

As you might notice, there is a consistent 200ms delay after the client receives its response (e.g. packets 106 and 110). When searching around for ‘200ms’ and ‘delay’, you may end up with articles describing the effect of the Nagle algorithm (Delayed ACK). Nagle is meant to reduce chatter on the wire, but can have a negative effect on near real-time communications, especially with small packets. Also, while 200ms might seem small, looking at the number of packets exchanged between Outlook and Exchange, this can add up quite quickly. Most of these articles will also describe a fix, recommending to configure a registry key TcpAckFrequency, and set it to 1 (default is 2). For testing purposes, we configured this key and after the mandatory reboot, the end user Outlook experience was snappy. However, setting this key would impact all client communications (real as well as VDI clients); not a recommended long-term solution due to side effects on the network.

After removing the registry key, investigating was continued. Since there was no issue with the Exchange 2010, we started to suspect there was perhaps an issue with VMWare, or there was some form of network optimization or packet inspection going on. This, due to the fact there was no problem with the old Exchange environment, and the elements that changed when migrating were VMWare vSphere version, physical vSphere hosts, and last but not least, the protocol switched. This client didn’t use Outlook Anywhere, so RPC/http was not enabled for Exchange 2010 prior to migration, and clients connected using MAPI. After some more investigating, some potentially related articles on the VMWare knowledgebase were found, talking about latency issues in certain VMWare Tools versions, the VMWare guest driver set, and downgrading these to 5.1 would have the same effect as configuring TcpAckFrequency. Unfortunately, this wasn’t an option as the hardware level of the VMWare guests already was on a certain level.

introRemediation
When installing VMWare Tools, the package comes with some system-level drivers which handle communications between the guest and the host or other guests. One of these drivers is the VMWare Guest Introspection driver (or VMCI Drivers, and formerly VShield Drivers). This component can be identified in the guest in the presence of the system drivers vnetflt and vsepflt, and accommodates agentless antivirus solutions like McAfee MOVE. However, it seems to also interfere with certain workloads in their driver ecosystem, thus negatively impacting real-time communications. I wasn’t able to test if the change from MAPI to RPC/http (or later MAPIhttp) also contributed to this effect, as the Introspection driver may not scan MAPI RPC packets at all, in which case there is no overhead introduced.

Needless to say disabling the Guest Introspection component might be less desirable for some organizations, and in those cases, when you experience this issue, I suggest contacting your VMWare representative, after verifying your VMWare Tools are part of the list of recommended versions.

In the end, in this situation Guest Introspection was disabled and a file-level scanner was introduced (with the required exclusions, of course). Performance for Online Mode was optimal when accessing Online Mode mailboxes, and using Exchange web services like Scheduling Assistant showed room planning in seconds rather than minutes.

image.pngNote that unfortunately, recent versions of VSphere running Exchange virtualized workloads also have this issue. On the plus side, they allow for separate (de)installation of the file system driver (NSX File Introspection Driver) and the network driver (NSX Network Introspection Driver). I am pretty sure removing the network driver would suffice, which might be a viable solution for some folks as well.

If you have any insights to share, please leave them in the comments.

Exchange and NFS – A Rollup

imageA short write-up after some recent articles which were published to clarify and emphasize Microsoft’s current position on virtualization and the support for storing Exchange information on NFS volumes. I will stick to the headlines, as the topic has already been touched several times by people from the Exchange community, after which I would mostly be repeating things that have already been said. Yet, many customers still have the perception that Exchange on NFS is supported or are actually running this configuration, often the result of a push from the storage or virtualization vendor. As it is not, I will repeat key information here to counter misleading information, hoping it might prevent customers from selecting unsupported configurations.

End of last year, a lively discussion was revived on some distribution lists and forums on why NFS was still not supported for storing Exchange information. However, it was all speculation as the creator of the product did not take part. The official support statement was (and is) that Exchange is not supported on NFS and only block-level storage is supported. Tony Redmond did a write-up on that here.

Then, in the preamble of the Microsoft Exchange Conference 2014, a ‘suggestion’ to support NFS was put on the community ideascale site, where people can propose suggestions for Exchange. This site is not an official channel but it does provide a way for the community to gather suggestions and check for demand. So, it allowed to verify if the current lack of NFS support was major thing or not, as people producing the most noise do not necessarily represent the majority. Response seemed limited, except for some hardware vendors who made lots of noise, possibly in an attempt to get traction in the Exchange community.

Then, Tony did a follow-up article after a discussion with Jeff Mealiffe, knowledgeable on Exchange, Sizing and Virtualization and nicknamed ‘The PerfGuy’ for obvious reasons. In the article, the problem areas of NFS are set out. Interestingly (but not surprising), Exchange is similar to SQL Server from a storage perspective, the latter having very specific documentation regarding storage requirements. Also mentioned is that successfully running JetStress by the vendor is no indication on the supportability of storage configurations. After all, that JetStress succesfully runs for a certain amount of hours is great, but it is a storage performance validation tool, not a storage supportability validation tool. At the Microsoft Exchange Conference 2014, using arguments presented earlier in the article, Jeff reaffirmed the non-support of NFS in his presentation.

The discussion seemed to die down until few weeks ago when Tony was in a Twitter conversation with one Josh Odgers, engineer at one of the storage vendors. In the discussion Odgers dropped the rationale and even went so far as to insult people. When searching online, you will find other rants as well, so I guess Josh’ employer does not have any form of social media guidelines for their employees. That does not help when you are trying to lobby for your cause (and potential markets for your storage appliances). Tony wrote an extensive response here, I recommend checking it out.

Now what storage vendors and their employees do or do not do is up to them. However, things like this may become an issue when vendors repeatingly and knowingly position their storage solution as a supported alternative to customers, like for example Odgers does for Nutanix (NDFS is Nutanix’ proprietary distributed NFS implementation). Yes, I’m sure it flies like a rocket and I am sure some customers will be persuaded by sales people to a game of chance by running Exchange on their appliances. As an Exchange consultant however, I prefer supported solutions and so should you. Or have a serious chat with the Risk Manager.

Update (Jul 9,2014): The UC Architects fellow Mahmoud Magdy posted a blog on his experiences and encountered limitations of storage appliances such as Nutanix here.

Exchange and potential Packet Loss on VMWare

technical_support_outage_advisory[1]Yesterday, I noticed a VMware knowledgebase article, updated on November 14th, which could be worth taking notice of when you’re running Exchange – or any other application – in a virtualized environment based on VMware technology.

VMware’s KB article 2039495 mentions that in VMware ESXi 4.x and 5.x, very high traffic bursts may cause the VMXnet3 driver to start dropping packets in the Guest OS. This has been observed on Windows Server 2008 R2 running Exchange 2010 with – as VMware puts it – a high number of Exchange users. What the article fails to mention is the configuration used by customers experiencing the issue. It might for example be valuable to know if a DAG was used, if the traffic (MAPI, replication) was split over multiple NICs or if it occurred with iSCSI storage. I won’t be surprised if the issue occurs with other high traffic situations as well, e.g. seeding. Luckily, Exchange is capable of handling certain hiccups so customers might not be even aware of the issue.

After some more digging I found another article, KB 1010071, which mentions a packet drop issue with VMware Guests known since ESX 3. This article explains a bit more why the issue occurs in the first place, being the network driver running out of receive buffers, causing the packets to be dropped between the Virtual Switch and the Guest OS driver.

One could argue about the impact of a few lost packets. However, as traffic increases the (potential) number of lost packets increases. Each lost packet results in retransmission of unacknowledged packets, which impacts overall throughput causing increased latencies.

VMware’s temporary solution to this problem is:

  1. Open up the Windows guest;
  2. Open the properties of the VMXNET3 NIC;
  3. On the Advanced tab, increase the Small Rx Buffers or Rx Ring #1 Size;
  4. What KB1010071 mentions and KB2039495 doesn’t, is that when using jumbo frames – not seldom used, e.g. replication  – you might need to adjust the Rx Ring #2 size and Large Rx Buffers values.

Now I say temporary, because VMware’s solution of course isn’t  a real solution; it’s only meant to – in their own words – reduce packet drops. Also, the KB1010071 article states you should “determine an appropriate setting by experimenting with different buffer sizes”. That doesn’t sound like an permanent, assuring solution for a virtualization environment running business critical applications now, does it?

All things considered, I’d recommend configuring these parameters to their maximum setting, preferably at installation time, unless anyone knows of a reason not to. In addition, this is another case for the best practice to split MAPI and replication traffic on Exchange using multiple NICs.

Finally, I already learnt of two other applications experiencing the issue. Therefor I think the problem is not Exchange 2010 specific, as KB2039495 might imply. If you have similar experiences, experienced differences between GbE and 10Ge, please use the comments to share.

Microsoft Exchange Conference 2012, a Summary

After being absent for over 10 years, this year the most anticipated conference for Exchange minded people took place in Orlando, Florida (US), the Microsoft Exchange Conference 2012 (MEC).

Despite not being able to attend MEC 2012, I’d like to summarize the news on Exchange 2013 from the event. Some of this information went public as part of the release of Exchange 2013 Preview, which was released in July (yes, almost 2 months ago – time flies). Some statements were new, like for example the expected release date of Exchange 2010 SP3, which is required for co-existence with Exchange 2013.

With all the social media nowadays, you can track most of the statements made at the event. Thanks to people like Jeff Guillet and Devin Ganger and people from our The UC Architects group, like  Dave Stork, Michael van Horenbeeck, Pat Richard, Serkan Varoglu and John A. Cook, who reported live from the sessions they were attending (hastag #iammec), the community was kept up to date with information as it unfolded. At each the end of the day, Tony Redmond gave a nice summary including comments on the event as a whole.

Picture shows some of people behind The UC Architects together
with Perry Clarke (GM Exchange), who you might recognize from
the Ask Perry videos. The picture is taken by Tony Redmond.

The information presented here is a summary of all the information provided through social media and is additional to the information presented at the release of Exchange 2013 Preview; you can read all about that in my Changes in Exchange 2013 Preview article. It is in no way meant to be conclusive or complete.

Ok, now on to the goodness.

Co-Existence
Exchange 2010 Service Pack 3 is expected to be released in the first half of 2013. Not only is it required for co-existence with Exchange 2013, it also supports Windows Server 2012 as Operating System platform. Note that SP3 will require a schema update.

No word on the expected release date of the update required for Exchange 2007 to support co-existence between Exchange 2013 and Exchange 2007. Since Exchange 2007 SP3 Rollup 8 was released in August, thus after the Exchange 2013 Preview became available, I assume we have to wait for Rollup 9 (or 10?).

Storage
Ross Smith from the Exchange Team confirmed the 99% IOPS reduction claim when comparing Exchange 2013 with Exchange 2003; when compared with Exchange 2010 it’s a 50% reduction. That’s down from 1 IOPS per mailbox in Exchange 2003 to .125 IOPS in Exchange 2010 to a 0,0625 IOPS per mailbox in Exchange 2013.

image

Also, passive copies have around 50% reduction in IOPS, mainly due to the increased checkpoint depth (100MB) and less aggressive pre-reading of data to keep in line with the checkpoint depth (I’ll devote a separate article on this at a later date). This means when mixing active and passive copies on a Mailbox server, the passive copies play more nicely from a storage perspective. Also, because of these changes database fail-over times are down from 20 seconds in Exchange 2010 to about 10 seconds in Exchange 2013.

To validate storage for Exchange 2013, JetStress for Exchange 2013 will become available 3 months after Exchange 2013 goes RTM. When required to validate storage in the mean time, it is recommended to utilize Exchange 2010’s version of JetStress since Exchange 2010 and Exchange 2013 will have the same IO pattern.

Databases
In Exchange 2013, multiple databases per storage volume allowed, which allows for active and passive copies on the same volume. Looking at the lower IOPS requirements of Exchange 2013 ESE’s engine and the 50% lower IOPS factor of passive copies, this allows for some serious consolidation on large volumes. The number of volume copies must match the number of databases per copy.

Note that putting databases on SMB3 shares (Windows Server 2012) is not supported; putting a virtualized Exchange server on SMB3 shares is.

Mailboxes
Besides the recommendation to embrace 7,200 RPM disks for Exchange storage, large mailbox implementations are expected to take off (100GB+, including mailbox, archive and recoverable items) in an ongoing battle to get rid of PSTs and 3rd party solutions.

Due to database accounting changes in Exchange 2013, mailboxes may see a 30% increase in size when moved from Exchange 2010 to Exchange 2013. Make sure you adjust mailbox quota settings accordingly.

Client Access
CAS 2013 will proxy client traffic to Exchange 2010 using the CAS 2010 server’s FQDN, i.e. it won’t determine or use internalURL or InternalNLBBypassUrl. You can’t configure CAS-to-CAS proxying per site; it’s an all or nothing setting. At RTM, Exchange 2013 Client Access servers won’t contain support for SSL offloading.

Health Checking
Exchange 2013 will not only check the server’s health looking at the Exchange services, but it will also check the protocols.

CAS 2013 will determine the health of legacy Exchange servers using a simple HTTP HEAD call.

Automatic Reseeding
Besides the ability to seed databases using multiple sources, which prevents the situation where multiple remote copies are seeded over WAN links from the active copy, Exchange 2013 contains a feature called Automatic Database Reseeding or just AutoReseed.

AutoReseed can be utilized to automatically reseed databases when required, e.g. after a storage failure. AutoReseed can even allocate and initialize spare disks to restore database redundancy. AutoReseed requires configuring three new properties, which are part of the DAG:

  • AutoDagVolumesRootFolderPath refers to the mount point containing all available volumes, including spare volumes;
  • AutoDagDatabasesRootFolderPath refers to the mount point containing the databases;
  • AutoDagDatabaseCopiesPerVolume sets the number of databases copies per volume.

So for example, when you’ve configured a mount point C:\Volumes (AutoDagVolumesRootFolderPath) containing mount points for databases, e.g. C:\Volumes\DB1, and mount point C:\Databases (AutoDagDatabasesRootFolderPath) with mount points to Exchange databases, e.g. C:\Databases\DB1 (where C:\Databases\DB1 maps to C:\Volumes\DB1), and DB1 contains folders for database and logfiles, AutoReseed can utilize mount points from C:\Volumes to automatically recreate and reseed databases when DB1 fails.

Site Resilience
Exchange 2013 will feature an automatic site (datacenter) fail-over using a witness server located in a 3rd well-connected site. This enables customers to automate the process of site switchovers, from primary to secondary site. This feature is optional.

This may confuse existing Exchange customers, who perhaps learned with Exchange 2007 a 3rd site for the cluster voter was not recommended, after which it shortly became an option with Exchange 2010. Then, after a while an adjusted recommendation was published not to use a 3rd site and now it’s option again,

Despite this, I think this certainly is a valuable feature. Normally, site outages and datacenter switchovers are stressful situations; if it’s preconfigured and automated, the less prone to error the switchover process is.

Exchange fellow and colleague Jaap Wesselius, who did
2 sessions on Load Balancing Exchange, was interviewed
by F5. Click the image to watch the interview.

Exchange Online
You can use Exchange 2003 with Exchange 2013 Online (when it becomes available) by utilizing an Exchange 2010 CAS server, just like today.

Safety Net
Safety Net is the new transport dumpster in Exchange 2013 and will provide similar functionality. It will also take over the functionality of Shadow Redundancy, which purpose in Exchange 2010 is to guarantee delivery of messages and accommodate for transport failure. Lagged Copy functionality is also enhanced by Safety Net, since you can activate lagged copies by activating the (lagging) copy after which Exchange 2013 will use Safety Net to make the database current. How long Safety Net will hold messages is a configurable setting.

Compliance
Exchange 2013 will support Litigation Hold, Time-based Hold (rolling data, e.g. items aged X days) and In-place Hold (formerly known as Legal Hold).

Unified Messaging
The Exchange 2013 UM role has a 100 concurrent calls limit. As you probably know, in Exchange 2013 Mailbox servers are used for UM as well. Because of that, this limit will have serious consequences when you’re designing an environment using several big servers; you might be forced to distribute the workload over more, lighter servers.

Exchange 2013 and ForeFront Treat Management Gateway
Exchange 2013 will work fine in conjunction with ForeFront TMG, except for maps feature when using TMG’s Forms-Based Authentication (FBA); the only thing you need to adjust is the logoff URL. Note that despite the ForeFront TMG 2010 End-of-Life statement from Microsoft last week, people like Greg Taylor (Program Manager Exchange) emphasized customers shouldn’t avoid using or opting for TMG while it is still available.

Public Folders
Migration of Public Folders from Exchange 2007 or Exchange 2010 is a cut-over scenario, so there will be no co-existence.

When using Exchange 2013 Public Folders next to Public Folders on Exchange 2007 or Exchange 2010, you need to manually map those to related folders in Exchange 2013 using CSV file.

Emphasis was put on being able to control Public Folders and put that data in the same store is worth losing the multi-master functionality.

Exhibitor ENow Consulting held a contest
for collecting the most autographs.

Message Hygiene
Exchange 2013 will include tools to block messages in a certain character set. This is useful in scenarios where you don’t expect messages in one of the Chinese languages and you want to block (potential) spam written in one of those languages.

In-Place Archiving
The new term for Personal Archive or Online Archive is In-place Archiving.

Message Routing
Exchange 2013 won’t use least-cost routing when routing messages, but it will use it to determine if Hub sites are defined. Exchange 2013 will honor Hub site definitions, but there are to be considered legacy.

A Delivery Group is a set of transport servers responsible for delivering messages to a certain routing destination. There are several types of Delivery Groups, depending on the destination, e.g. DAG or Site. Each transport server is used in a Round-Robin fashion when delivering messages.

An MBX server and CAS server listen for incoming messages on port 25 unless co-located; then the MBX server will listen on port 2525.

More background information on message routing in Exchange 2013 also in conjunction with Exchange 2010 is to be found here.

Licensing
It is no longer required to have an Enterprise license for eDiscovery; it is still required to have an Enterprise license when using Legal Hold.

Virtualization
Many statements were made to de-emphasize virtualizing Exchange and only use if for testing purposes. When virtualizing, the same rules apply as for Exchange 2010.

Like with earlier versions of Exchange, the ESE engine will claim memory at startup using the amount of physical ram. Configuring Dynamic Memory is therefor not only pointless but also not recommended, like I stated in an earlier post on Exchange and Dynamic Memory.

It is also emphasized that putting VMDK files on VMWare NFS disks is not a supported scenario, so I assume this is often seen in the field despite not being supported from Microsoft.

Mobile
ActiveSync in Exchange 2013 will cause 65% less RPC communications over Exchange 2010.

Outlook Web Access
When using OWA 2013 in offline mode, the locally generated cache file isn’t secure; use of BitLocker is recommended. Single Sign-On in combination with OWA on Exchange 2013 redirection will be fixed post-RTM. Also, be advised that at RTM, OWA in Exchange 2013 won’t have support for Public Folders.

IAMMEC Portal
A portal for the Exchange community was announced, iammec.com. Here, people involved with Exchange can get information from within Microsoft or other sources. How this will differ from the Exchange related topics on TechNet forum is to be seen.

It is unknown if there will be a MEC in 2013; Microsoft’s director of PM for Exchange, Michael Atalla, said there will a MEC when “theres’s something  to talk about”. It is rumored that recordings of the 1st day of the conference will be made available at a later date, except for the interactive sessions.

PS: The icon accompanying this article is the Exchange 2013 logo.

TechEd North America 2012 sessions

With the TechEd North America 2012 event still running, recordings and slide decks of finished sessions are becoming available online. Here’s an overview of the Exchange-related sessions:


Thoughts on "VMware Zimbra vs Microsoft Exchange"

Note: This blog was written together with Dave Stork after reading a Zimbra and Exchange product comparison. You can find the article on Dave’s blog here, including a personal note by Dave.

In a blog post by Christopher Wells, alias vSamurai, the author positions VMware Zimbra Collaboration Server (ZCS 7.x) as an enterprise-ready drop-in replacement for Microsoft Exchange Server 2010 environments of all sizes. He also suggests Zimbra is a better multi-tenant solution for ISPs. The author does this by comparing both products in a feature comparison.

These reviews are helpful in order for companies to make an informed decision. After all, there’s nothing wrong with a bit of competition. However, Dave Stork and I wanted to create a response, because some statements are flawed or just plain wrong. In the process, we will be following the structure of the referenced blog:

Backup and Restore
The author starts off by claiming that “the ease with which backup and restore can be performed in Zimbra outweighs the capabilities of Exchange”. While it’s interesting to note the author implicitly admits Exchange is more capable, he misses the point. The product should follow a well-designed backup and recovery strategy, based on customer demands and compliance regulations. Where Exchange has server, database, mailbox and single item recovery options, Zimbra is built on top of MySQL, meaning recovery requires brick level restore or (partially) restoring information from MySQL dumps. Also, in Zimbra the databases only contains meta information; the actual messages and attachments are stored on the file system. While this makes sense for Zimbra, as many SQL people consider storing binary data in databases a bad practice, it increases the complexity of backup and restore, because meta information and file system needs to be in sync. Note that Exchange’s Extensible Storage Engine (ESE) is purpose-built for storing mailbox information, including attachments.

Scalability
Then, the author claims that Zimbra has better scaling capabilities than Exchange. First, let’s start by looking at the definition of scaling. A system is said to scale well if:

  • it can handle increased load without (serious) performance penalties, or
  • the system is able to accommodate growth by adding resources (scale up) or additional systems (scale out).

Ideally, scaling up should show a linear pattern, meaning two systems equal can handle twice the load. Scaling out most of the time doesn’t, which makes sense when looking at how computers are designed using shared resources like buses for example.

Now, scaling isn’t solely a matter of hardware; a system also requires software built to scale. The role-based model of Exchange, with its specific roles for serving mailboxes and handling replication, routing e-mail and servicing clients, is a good example of a thought-out scalability supporting concept. Of course, you can install all roles on a single server, which is currently the recommended practice by Microsoft, but you’re still able to design fit-for-purpose farms and clusters.

Thus, the ability to scale is determined by the whole set of components playing well together, hardware and software. With this in mind we’d like to include an interesting table which is part of the VMware (acquired Zimbra early 2010) study “Zimbra Collaboration, Server Performance on VMware vSphere 5.0”:

In their analysis, VMware primarily focuses on the CPU utilization figure. That figure implies that Zimbra has more headroom than Exchange using the same configuration. However, Exchange also has several background processes which perform tasks in the background, like optimizing the database to reduce the number of IOPS. Yes this takes up a certain % of CPU cycles, but optimizing storage for sequential access could explain the significant 240% decrease in IOPS for Exchange. Lower IOPS reduces storage requirements – and costs – for Exchange. The over 60% lower latency figure for Exchange is also an indication overall processing of messages is faster in Exchange.

Costs
As often in these Open Source Software (OSS) discussions, the cost card is played. The author claims that on average, Zimbra is 50% cheaper than Exchange. However, this claim is made without any supporting references or figures, making it difficult to verify this statement. However, from our experiences, those claims are often primarily based on retail prices and licensing costs. What is often overlooked (or ignored) in comparisons with OSS, are training costs or hidden costs like support or maintenance.

Functionality is also a potential cost saver, as companies can work more efficiently due to added or enhanced functionality. These savings depend on customer needs, although some are widely used and immediately contribute to lower costs, like for example AutoDiscover (automatic configuration of Outlook 2007 and later clients or ActiveSync devices).

Exchange natively supports Outlook, common browsers and mobile devices; Zimbra requires an Outlook plug-In, Zimbra Connector for Microsoft Outlook, increasing support and maintenance costs. Note that this connector is only available for Zimbra Collaboration Server Network Edition Professional users.

Regarding maintenance, Exchange requires Exchange, Active Directory and (optionally, but a big bonus) PowerShell skills. Zimbra consists of a set of 3rd party products, requiring knowledge of each product, like Postfix, mbox e-mail storage, MySQL, Apache. OpenLDAP, SpamAssassin, ClamAV and shell scripting. Of course, more components mean more products to configure and maintain, increasing maintenance costs.

Storage Benefits
A full paragraph is dedicated to the benefits of using Zimbra with NetApp storage. However, the NetApp products and technologies mentioned are not Zimbra specific, and therefor in our opinion do not add anything to the discussion.

Feature Comparison
The author then continues with a “direct” feature comparison between Zimbra and Exchange. Let’s have a look:

1. Platform Architecture
First, author claims ESE is over 20 years old, the .EDB file is non-modular and the ESE engine is non-tunable. Yes, ESE exists for over 20 years, but that’s also 20 years of experience in building a fit-for-purpose database engine. With each new Exchange version, ESE was redesigned to meet evolving requirements and expectations in a changing world. When looking at the VMware IOPS comparison in the Scalability section, it’s Zimbra that should worry about storage.

Second, author claims Database Availability Groups (DAGs), based on Fail-over Clustering, isn’t a proven technology for large deployments. Exchange 2010 is on the market since October 2009. Like many Exchange fellows, we have designed or seen large Exchange deployments (i.e. thousands of mailboxes). Also, if millions of Office 365 users aren’t proof of a successful large scale multi-tenant ISP-like deployment based using multiple data center DAGs, what is?

To be honest, is it really that important which exact technology is used and how old it is? In the end functionality and performance are more important, as they are relevant in any business case for Exchange. What would a decision maker most likely ask, “Does it use Microsoft SQL Server?” or “What can we do with it and how much will it cost?”. We think and know out of experience it will probably be the latter.

2. Reliability & Robustness
The author claims Microsoft is considering (moving Exchange storage to) SQL and needs to prove robustness of the new architecture. While Microsoft has considered the SQL storage engine several times, it decided to stick with the optimized ESE engine. This was also true for Exchange 2010 back in 2009, like you can read in this blog. Main reason for deciding to stick with ESE is performance.

When pleading for ZCS, the author states “Linux has better uptime”. While this may have been true in the Windows 98 era, from experience, managed Exchange systems can reach similar uptime figures. On the contrary, I’ve seen Linux systems crashing every few days. The only conclusion you can draw here is that reliability not only depends on hardware and software components and their quality, it also depends a lot on if and how systems are managed. Also, don’t confuse uptime with availability, as planned downtime will reset my uptime statistic, but that’s all it is: a statistic.

3. Tiered Storage (was Platform Scalability)
Tiered storage, or Hierarchical Storage Management, is about classifying data in terms of things like security, performance or pricing. Exchange itself partly supports this concept, using elements like DAGs, databases, mailboxes, personal archives and retention policies. For example, you can home your mailbox on multiple lean and mean servers using fast SAS storage while personal archives, used to automatically store e-mail older than 1 year using retention policies, are served by a fat server using inexpensive SATA disks on JBOD storage.

ZCS utilizes a built-in HSM solution which automatically moves items from the (fast) primary volume to the (cheaper) secondary volume. The database holds information on the actual location where the item resides. Conceptually, this matches the Exchange concept of primary mailbox and personal archive using retention policies. However, retention policies are more powerful and – when permitted – give users control over what to archive and when. When Exchange customers want to use a deeper level of storage tiering, they can opt for 3rd party solutions like Symantec Enterprise Vault (item-level stubbing) or storage solutions.

Note however, there are some important factors to take into consideration with stubbing:

  • Data stored on a different tier, e.g. tape, isn’t always available online;
  • Tiered storage adds complexity, introducing the need to compare reduced costs for storage against additional costs due to increased complexity;
  • Stubbing may impact future migration or transition options, e.g. vendor support, or recovery options.

4. High Availability
Author claims DAGs do not provide Exchange infrastructure protection and have a learning curve. The first part of that claim is absolutely true: DAGs are designed to increase the availability of Exchange databases served by Exchange servers holding the Mailbox role, while providing a fail‑over mechanism. Covering for the other tasks are the other Exchange roles. Mail flow within an Exchange Environment is automatically redundant when you have multiple Hub Transport servers, as they monitor connectivity and possible routes for delivery. For client access, multiple Client Access servers can be made redundant using load balancing technology. Exchange has these built-in features that work independent of where Exchange is running, i.e. they also work in a non-virtualized system and no additional high priced product is required to make the underlying services highly available.

Regarding the learning curve claim, every new technology has a learning curve. DAG is built on top of fail-over clustering (nothing new) and easier to manage than its predecessors, CCR and SCR. Then again, we’d prefer Exchange admins who know what they’re doing, rather than somebody who learned an SRM trick.

Speaking of which, the whole argument that “ZCS with VMware’s Site Recovery Manager (SRM) is proven, scalable and effective” is apparently nothing more than a plug for VMware’s SRM product in conjunction with VMware licenses (vSphere required), as we see no credible arguments.

5. Platform Extensibility
The author states that Microsoft recommends using its proprietary shell. We assume he means PowerShell, which is here to stay. Other vendors, like Cisco or Quest, are adopting it and offer modules to manage their products using PowerShell. Heck, even Zimbra offers PowerShell scripts to manage Zimbra through encapsulated SOAP requests. For the record, we both don’t know of any Exchange admin complaining about some Linux product requiring bash (Bourne-Again shell) or perl for scripting, turning this in a non-argument.

The author continues by apparently mixing a few things up. The argument given for ZCS is that “SOAP API allows server access using web services framework for client access and Zimlets for integration with 3rd-party services” while Exchange offers “limited SOAP access” and “Outlook add-ins require developer effort”. This is apples versus oranges; Outlook is a fat client and Zimlets are like web parts. If you want to make a nice dashboard, we’d suggest you use something like Sharepoint instead of bloating your e-mail web client.

Finally, SOAP and Exchange Web Services (EWS) are targeted at developers, PowerShell at automation. If you’re curious about the power of EWS, we’d suggest you check out the excellent blog by Glen Scales.

6. Platform Openness
While Exchange is mostly closed source, a lot has changed since the 90’s. Exchange has a developer center nowadays, where SDK and APIs are published on how to interact with certain parts of the Exchange ecosystem, e.g.:

7. Open Standard Protocols Support
It’s true that the current Outlook version doesn’t support all available standards for exchanging calendaring or contact information. However, for most companies that isn’t an issue. When required, solutions and workarounds are available.
Also see “Mobile Support”.

8. Rebranding
The author claims Outlook Web Access (OWA) has a single theme. That might have been the case with the RTM version, but since SP1 we have over 28 themes to choose from. If that’s not enough, there’s even an Exchange Server 2010 SP1 Outlook Web App Customization SDK to take customization into your own hands. Note that the SDK also documents integrating IM (e.g. Lync).

9. Web Client Support
Regarding Web Client support, the author states “limited browser support for OWA” (Outlook Web App). Since SP1, OWA has full support for IE7+, Firefox 3.01+ (Windows, MacOS, Linux), Chrome 3.0.195.27+ (Windows), Safari 3.1+ (MacOS). In addition, OWA Mini, targeted at simple mobile browsers, reincarnated in Exchange 2010 SP2.

Yes, there are browsers out there that don’t have the full featured Premium OWA (like Opera), but “limited browser support for OWA” is a bit over-simplified, especially if you take into consideration the combined market shares of the fully supported browsers (without Safari, between 81-91% since December 2011).

10. Mac Support
Outlook team and Mac Outlook are produced by two different teams, which might be one of the reasons for the feature disparity between Outlook 2010 and Outlook for Mac 2011. Apart from differences caused by the underlying operating system, we agree features should be as on par as possible for all available platforms.

Note that the mentioned Zimbra desktop client doesn’t support Exchange’s native MAPI protocol, adding the requirement to enable the IMAP or POP protocol on the Exchange server.

11. Linux
The author proceeds by arguing there’s no Outlook client or Exchange Server for Linux. That is a moot point; there’s also no Zimbra server for Windows. Also, when somebody’s trying to convince you using arguments like, “ZCS server components love the Linux platforms”, that’s not very convincing now, and is often seen with discussions when emotions prevail over rational thinking.

12. Mobile Support
More and more (mobile) clients are adopting the Exchange ActiveSync (EAS) protocol for exchanging e-mail, calendar, contact and task information with Exchange. In fact, even Blackberry announced they will adopt EAS in their upcoming Blackberry 10 OS product. This is probably driven by Microsoft releasing EAS protocol as part of their Open Specifications Promise, turning EAS more or less into the de‑facto standard for (corporate) e-mail synchronization for mobile clients.

Zimbra partially supports EAS for e-mail, calendar and contacts, but requires the Zimbra Mobile add-on. It is a bit unclear if tasks are synced, here it seems so for Pro users but here it is advised against while here the screenshots tell yet another story. Confusing.

13. Multi-tenancy
The author doesn’t show how Zimbra is a better multi-tenancy solution for ISPs when compared to Exchange 2010. But since Exchange 2010 Service Pack 2, there is no need for third party hosting software as it is now fully incorporated in Exchange without extra costs.
However; the intent was possibly to prove this implicitly via the costs argument of on-premises deployments. One other way is to look at actual hosted Zimbra and Exchange solutions available commercially.

Let’s compare costs from random Zimbra providers (picked from Zimbra’s Partners list), Exchange hosting providers and Office 365 subscriptions. It is not an extensive comparison, but it should give us an indication. Some (not all) are shown here:

Product MrMail Professional Zimbra Mailbox CVM Zimbra Professional Suite PayPerCloud Hosted Exchange Professional Office 365 Exchange Online Office 365 Plan E1
Storage 8GB 1GB 25GB 25GB 25GB
(mailbox, sharepoint is separate and additional)
Own mail domain Yes Yes Yes Yes Yes
Attachment size 20MB ? ? 25MB 25MB
Web Access yes yes yes yes yes
POP / IMAP yes/yes yes/yes yes/yes yes/yes yes/yes
ActiveSync yes yes yes yes yes
Antimalware yes yes yes yes yes
SharePoint or similar yes yes no no yes
Lync IM/Presence no no no no yes
Price per user per month $8.61* $7* $7.95** $4** $8**

*) discounts possible with more mailboxes
**) Note that prices are per month, but only apply with an annual subscription.

This table shows that the Exchange subscriptions are comparable or provide more functionality for lower costs. We do not see the 50% cost benefit argument at all and in our opinion shows that Exchange 2010 is a very viable multi-tenancy solution for ISPs.

One very important difference we want to point out is the available storage per mailbox. This tended to be a lot (several factors) more with Exchange than with Zimbra, without heavily impacting the price. This fact alone suggests that Exchange can be a very viable groupware solution to ISPs.

Final words
This concludes the authors’ feature comparison, but there are still some important elements missing, like product support, directory integration, IPv6 readiness, traffic management (e.g. ethical walls) or IRM. Also, what about integration or support of Unified Communications technologies, like single inbox – including voicemail – or voice access to mailbox?

Now don’t get the impression we want to condemn Christopher for trying to compare both products, even though by reading just the header and counting the numerous VMware-related logos on the site we were a bit hesitant regarding what the “conclusion” would be (we have a saying here, We from WC Eend recommend WC Eend).

We do appreciate good comparisons, because it can shake up our opinions of what is and what should be with Exchange and start interesting discussions. It‘s also an opportunity to learn about similar products. We believe competition is healthy and comparisons can be educational; It can help companies make a better fit for their needs and budget, or at least provide a starting point.

It is however crucial for a fair comparison that the facts, conclusions and opinions stated are correct and sound. Unfortunately, this is not the case with this article. There are numorous factual errors and most opinions stated are poorly argumented. To add to that, the author uses a feature list which can be found on the internet in several places, like here. This may be an indication authors are copying content, without knowledge or cross-checking facts.

Therefore, with the information provided in Christophers blogpost, one can’t conclude that Zimbra is an adequate replacement for all environments, Enterprise or SMB. Also, we do not see any indication that Zimbra is better suited for multi-tenancy by ISPs. If anything, we think we have shown that Exchange is a more than capable, competitive and well-though product.

You’re invited to comment or share your opinions in the comments below.

Update (April 10th): Apparently, on March 21st Wells posted a follow up on his Zimbra versus Exchange viewpoint. Looking at it, Wells seems to enjoy the attention.  Despite saying discussing viewpoints keeps vendors’ focus sharp, he doesn’t come up with arguments on why our post was – in Wells’ words – flawed. While I believe Zimbra serves a purpose – and it certainly isn’t on my radar as Wells says – I feel Zimbra or other non-Exchange evangelists should be able to take feedback like a pro. When you ignore other viewpoints or remain silent when asked for arguments, it’s more like a monologue rather than the interaction Wells claimed he’s in favour of.

Finally, our post didn’t go unnoticed, as Tony Redmond referred to it an article on Windows IT Pro. In the article, called Dispelling myths and other half truths, Redmond addresses some of Wells’ flawed claims as well.

TechEd North America 2011 sessions

With the end of TechEd NA 2011, so ends a week of interesting sessions. Here’s a quick overview of recorded Exchange-related sessions for your enjoyment:

Virtualized Exchange 2010 SP1 UM, DAGs & Live Migration support

Today, just before TechNet North America 2011, Microsoft published a whitepaper on virtualizing Exchange 2010, “Best Practices for Virtualizing Exchange Server 2010 with Windows Server® 2008 R2 Hyper V“.

There are some interesting statements in this document which I’d like to share with you, also after the Exchange team published an article on supported scenarios regarding virtualized shortly shortly after this paper was published.

First, as Exchange fellow Steve Goodman blogged about, a virtualized Exchange 2010 SP1 UM server role is now supported, albeit under certain conditions. More information on this at Steve’s blog here.

The second thing is that live migration, or any form of live migration offered by products validated through the Windows Server Virtualization Program (SVVP) program, is now supported for Exchange 2010 SP1 Database Availability Groups. Until recently, the support statement for DAGs and virtualization was:

“Microsoft does not support combining Exchange high availability (DAGs) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers. DAGs are supported in hardware virtualization environments provided that the virtualization environment doesn’t employ clustered root servers, or the clustered root servers have been configured to never failover or automatically move mailbox servers that are members of a DAG to another root server.”

The Microsoft document on virtualizing Exchange Server 2010 states the following on page 29:

“Exchange server virtual machines, including Exchange Mailbox virtual machines that are part of a Database Availability Group (DAG), can be combined with host-based failover clustering and migration technology as long as the virtual machines are configured such that they will not save and restore state on disk when moved or taken offline. All failover activity must result in a cold start when the virtual machine is activated on the target node. All planned migration must either result in shut down and a cold start or an online migration that utilizes a technology such as Hyper-V live migration.”

The first option, shutdown and cold start,  is what Microsoft used to recommend for DAGs in VMWare HA/DRS configurations, i.e. perform an “online migration” (e.g. vMotion) of a shut down virtual machine. I blogged about this some weeks ago here since VMWare wasn’t always clear about this. Depending on your configuration, this might not be a satisfying solution when availability is a concern.

The online migration statement is new as well as the host-based fail-over clustering. In addition, though the paper is aimed at virtualization solutions based on Hyper-V R2, the Exchange Team article is more clear on supported scenarios for Exchange 2010 SP1 with regards to 3rd party products (VMware HA); if the product is supported through the SVVP program, usage of Exchange DAGs are supported. Great news for environments running or considering virtualizing their Exchange components.

Be advised that in addition to the Exchange team article, the paper states the following additional requirements and recommendations as best practice:

  • Exchange Server 2010 SP1;
  • Use Cluster Shared Volumes (CSV) to minimize offline time;
  • The DAG node will be evicted when offline time exceeds 5 seconds. If required, increase the heartbeat timeout to maximum 10 seconds;
  • Implementation of latest patches for the hypervisor;
  • For live migration network:
    – Enable jumbo frames and make sure network components support it;
    – Change receive buffers to 8192;
    – Maximize bandwidth.

Note that on May 17th the DAG support statement for Exchange 2010 SP1 on TechNet was updated to reflect this. However, the last two sentences might restart those “are we supported” discussions again:

“Hypervisor migration of virtual machines is supported by the hypervisor vendor; therefore, you must ensure that your hypervisor vendor has tested and supports migration of Exchange virtual machines. Microsoft supports Hyper-V Live Migration of these virtual machines.”

So, if vendor A, e.g. VMWare, has tested and supports vMotioning DAGs with their hypervisor X, Microsoft will support Live Migration for virtual machines on hypervizor X using Hyper-V? Now what kind of statement is that?

(Updates: May 16th – statements from EHLO blog, May 17th – mention updated TechNet article)

VMWare HA/DRS and Exchange DAG support

Last year an (online) discussion took place between VMWare and Microsoft on the supportability of Exchange 2010 Database Availability Groups in combination with VMWare’s High Availability options. Start of this discussion were the Exchange 2010 on VMWare Best Practices Guide and Availability and Recovery Options documents published by VMWare. In the Options document, VMWare used VMware HA with DAG as an example and contains a small note on the support issue. In the Best Practices Guide, you have to turn to page 64 to read in a side note, “VMware does not currently support VMware VMotion or VMware DRS for Microsoft Cluster nodes; however, a cold migration is possible after the guest OS is shut down properly.” Much confusion rose; was Exchange 2010 DAG supported in combination with those VMWare options or not?

In a reaction, Microsoft clarified their support stance on the situation by this post on the Exchange Team blog. This post reads, “Microsoft does not support combining Exchange high availability (DAGs) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers.” This meant you were on your own when you performed fail/switch-overs in an Exchange 2010 DAG in combination with VMWare VMotion or DRS.

You might think VMWare would be more careful when publing these kinds of support statements. Well, to my surprise VMWare published a support article 1037959  this week on “Microsoft Clustering on VMware vSphere: Guidelines for Supported Configurations”. The support table states a “Yes” (i.e. is supported) for Exchange 2010 DAG in combination with VMWare HA and DRS. No word on the restrictions which apply to those combination, despite the reference to the Best Practices Guide. Only a footnote for HA, which refers to the ability to group guests together on a VMWare host.

I wonder how many people just look at that table, skip those guides (or overlook the small notes on the support issue) and think they will run a supported configuration.

Exchange & Dynamic Memory : Don’t

With the arrival of Service Pack 1 for Windows Server 2008 R2, Dynamic Memory was introduced. In brief, Dynamic Memory is a memory management enhancement for Hyper-V which allows running virtual machines (VM) to allocate memory from the host and releasing it when possible, giving a minimum and maximum memory boundary. The main benefit is a higher VM density, because each VM will only allocate what’s required and you don’t have to approximate memory allocations.

Now this mechanism works well for many applications, but not for Exchange. Exchange’s goal – at least that of servers holding the mailbox role – is to claim as much memory as possible in order to cache information. This amount depends on the installed of memory (more information here). This cache is used for performance reasons, more cache means less I/O’s, less I/O’s result in better performance. You can guess what happens when you run Exchange with a minimal amount of memory and lots of dynamic memory configured, optionally shared with other Dynamic Memory-enabled VM’s. If Exchange starts up and wants to claim memory for caching or allocate memory for other reasons (transactions), instead of the memory being available instantly the host first needs to allocate it, or worse have other VM’s surrendering it their memory. That doesn’t make sense and will result in significant performance penalty.

Besides it being pointless to configure Dynamic Memory for Exchange, it’s also not recommended. From the Exchange 2010 System Requirements:

Many of the performance gains in recent versions of Exchange, especially those related to reduction in I/O, are based on highly efficient usage of large amounts of memory. When that memory is no longer available, the expected performance of the system can’t be achieved. For this reason, memory oversubscription or dynamic adjustment of virtual machine memory should be disabled for production Exchange servers.

Also, from the paper Implementing and Configuring Dynamic Memory, on applications that may not perform as well after Dynamic Memory is enabled:

  • Applications that perform their own memory management by taking over certain aspects of memory management from the operating system. Such applications typically grab as much memory as they possibly can in order to ensure the application’s best performance which can cause the amount of memory allocated to their virtual machine to grow until it reaches the amount specified by the Maximum RAM setting;
  • Applications where memory allocation is a one shot operation that is performed either when the application starts for the first time or each time the application starts.

Concluding, yes you can use Dynamic Memory for your lab or testing environment and it works. But don’t use it in production for Exchange Server.

Credits to Jetze who blogged about this originally here (Dutch).