Exchange and VMWare Guest Introspection

Ex2013 LogoIn this long overdue article, I would like to share an experience, where a customer was upgrading from Exchange 2010 to Exchange 2013. Note that this could also apply to customers migrating from Exchange 2007 or migrating to Exchange 2016 as well. The Exchange 2013 servers were hosted on VMWare vSphere 5.5U2; the Exchange 2010 servers on a previous product level.

The customer saw a negative impact on the end user experience of Outlook 2010 users, especially those working in Online Mode. Other web-based services like Exchange Web Services (EWS) were affected as well. The OWA experience was good.

After migrating end user mailboxes from Exchange 2010 to Exchange 2013 (but as indicated, this applies to Exchange 2016 as well), end users reported delays in their Outlook client responses, where sometimes Outlook seemed to ‘hang’ when performing certain actions like accessing a Shared Mailbox. Also, when opening up the meeting planner in order to schedule a room using Scheduling Assistant, it could take a significant amount of time, (i.e. minutes) before the schedule of all the rooms was being displayed.

The end users’ primary mailbox was configured to use Cached Mode, except for VDI users who used their primary mailbox in Online Mode. Shared Mailboxes were used in Online Mode due to the size (Outlook 2010, so no slider).

First, the overall health of the Exchange environment was checked to exclude it as a potential cause. Exchange performance metrics were monitored, as well as Managed Availability status and events, logs like the RCA logs, and VMWare CPU Ready % to check for potential vCPU allocation issues (read: oversubscription). None of these metrics caused any reason for concern.

After reconfiguring the HOSTS file, in order to bypass the load balancer and direct traffic to a single Exchange server to simplify troubleshooting, the symptoms remained. Then, we checked:

  • TCP/IP optimization settings, e.g. RSS, Chimney, etc.
  • VMWare VMXNet3 offloading, e.g. Large Send Offload, TCP Checksum Offloading
  • VMWare VMXNet3 buffer settings

All those settings were also found to be on their recommended values.

We started digging in from the client’s perspective, and used WireShark to see what was going on on the wire. After filtering on the Exchange host, we saw the following pattern:


Note that this customer used SSL Offloading, so mailbox access took place on port 80 instead of 443 (RPC/http).

As you might notice, there is a consistent 200ms delay after the client receives its response (e.g. packets 106 and 110). When searching around for ‘200ms’ and ‘delay’, you may end up with articles describing the effect of the Nagle algorithm (Delayed ACK). Nagle is meant to reduce chatter on the wire, but can have a negative effect on near real-time communications, especially with small packets. Also, while 200ms might seem small, looking at the number of packets exchanged between Outlook and Exchange, this can add up quite quickly. Most of these articles will also describe a fix, recommending to configure a registry key TcpAckFrequency, and set it to 1 (default is 2). For testing purposes, we configured this key and after the mandatory reboot, the end user Outlook experience was snappy. However, setting this key would impact all client communications (real as well as VDI clients); not a recommended long-term solution due to side effects on the network.

After removing the registry key, investigating was continued. Since there was no issue with the Exchange 2010, we started to suspect there was perhaps an issue with VMWare, or there was some form of network optimization or packet inspection going on. This, due to the fact there was no problem with the old Exchange environment, and the elements that changed when migrating were VMWare vSphere version, physical vSphere hosts, and last but not least, the protocol switched. This client didn’t use Outlook Anywhere, so RPC/http was not enabled for Exchange 2010 prior to migration, and clients connected using MAPI. After some more investigating, some potentially related articles on the VMWare knowledgebase were found, talking about latency issues in certain VMWare Tools versions, the VMWare guest driver set, and downgrading these to 5.1 would have the same effect as configuring TcpAckFrequency. Unfortunately, this wasn’t an option as the hardware level of the VMWare guests already was on a certain level.

When installing VMWare Tools, the package comes with some system-level drivers which handle communications between the guest and the host or other guests. One of these drivers is the VMWare Guest Introspection driver (or VMCI Drivers, and formerly VShield Drivers). This component can be identified in the guest in the presence of the system drivers vnetflt and vsepflt, and accommodates agentless antivirus solutions like McAfee MOVE. However, it seems to also interfere with certain workloads in their driver ecosystem, thus negatively impacting real-time communications. I wasn’t able to test if the change from MAPI to RPC/http (or later MAPIhttp) also contributed to this effect, as the Introspection driver may not scan MAPI RPC packets at all, in which case there is no overhead introduced.

Needless to say disabling the Guest Introspection component might be less desirable for some organizations, and in those cases, when you experience this issue, I suggest contacting your VMWare representative, after verifying your VMWare Tools are part of the list of recommended versions.

In the end, in this situation Guest Introspection was disabled and a file-level scanner was introduced (with the required exclusions, of course). Performance for Online Mode was optimal when accessing Online Mode mailboxes, and using Exchange web services like Scheduling Assistant showed room planning in seconds rather than minutes.

image.pngNote that unfortunately, recent versions of VSphere running Exchange virtualized workloads also have this issue. On the plus side, they allow for separate (de)installation of the file system driver (NSX File Introspection Driver) and the network driver (NSX Network Introspection Driver). I am pretty sure removing the network driver would suffice, which might be a viable solution for some folks as well.

If you have any insights to share, please leave them in the comments.

Configuring Anti-Affinity in Failover Clusters

powershellMany customers nowadays are running a virtualized Exchange environment, utilizing Database Availability Groups, load balanced Client Access Servers and the works. However, I also see environments where it is up to the Hypervisor of choice on the hosting of virtual machines after a (planned) fail-over. This goes for Exchange servers, but also for redundant infrastructure components like Domain Controllers or Lync Front-End servers for example.

So, leaving it to “default” is not a good idea when you want to achieve the maximum availability potential. Think about what will happen if redundant roles are located on the same host and that host goes down. What you want to do is prevent hosts from becoming the single point of failure, something which can be accomplished by using a feature called anti-affinity. This will distribute virtual machines over as much hosts as possible. Where affinity means to have an preference for, like in Processor Affinity for processes, Anti-Affinity can be regarded as repulsion in magnetism.


For VMWare, you can utilize DRS Anti-Affinity rules; I’ll describe how you can configure Anti Affinity in Hyper-V clusters using the AntiAffinityClassNames property (which by the way already exists since Windows Server 2003). And yes, property means it’s not accessible from the Failover Cluster Manager, but I’ve create a small PowerShell script which lets you configure the AntiAffinityClassNames property (in pre-Server 2012 you could also use cluster.exe to configure this property).

Note: For readability, when you see virtual machine(s), read cluster group(s); In Microsoft failover clustering, a clustered virtual machine role is a cluster group.

Now, before we’ll get to the script, first something on how AntiAffinityClassNames works. The AntiAffinityClassNames property may contain multiple unique strings which you can make up yourself. I’d recommend creating logical names based on the underlying services, like ExchangeDAG or ExchangeCAS. When a virtual machine is moved the process is as follows:

  1. When defined, the cluster tries to locate the next preferred node using the preferred owner list;
  2. Does the designated node host a virtual machine with a matching element in their AntiAffinityClassNames property; if not, the designated host is selected; if it is, move to the next available preferred owner and repeat step 2;
  3. If the list is exhausted (i.e. only anti-affined hosts), the anti-affinity attribute is ignored and the preferred owner list is checked again, ignoring anti-affinity (“last resort”).

Traces of Anti-Affinity influencing failover behavior can be found in the cluster event log:

00000648.00000d54::2013/07/22-10:40:33.162 INFO  [RCM] group ex2 should fail back from node 2 to node 3 now due anti-affinity

Now on to the script, Configure-AntiAffinity.ps1. The syntax is as follows:

Configure-AntiAffinity.ps1 [-Cluster] <String> [-Groups] <Array> [-Class] <String> [[-Overwrite]] [[-Clear]] [<CommonParameters>]

A small explanation of the available parameters:

  • Cluster is used to specify which cluster you cant to configure (mandatory);
  • Groups specifies which Cluster Groups (Virtual Machines) you want to configure Anti-Affinity for (mandatory);
  • Class specifies which name you want to use for configuring Anti-Affinity (optional, AntiAffinityClassName);
  • When Overwrite is specified, all existing Anti-Affinity class names will be overwritten by Class for the specified Groups, otherwise Class will be added (default);
  • When Clear is specified, all existing Anti-Affinity class names will be removed for the specified Groups;
  • The Verbose parameter is supported.

So, for example assume you have 3+ Hyper-V cluster named Cluster1 consisting of 3+ nodes running 3 virtualized Exchange servers hosting a 3-node DAG, ex1, ex2 and ex3 and you want to configure anti-affinity for these virtual machines using the label PRODEX, you could use the script as follows :

Configure-AntiAffinity.ps1 -Cluster Cluster1 -Groups ex1,ex2, ex3 –Class PRODEX –Verbose

To clear anti-affinity you could use:

Configure-AntiAffinity.ps1 -Cluster Cluster1 -Groups ex1,ex2,ex3 -Clear

Here’s a screenshot of the script for creating anti-affinity, add additional anti-affinity class names and clearing anti-affinity settings:


Feedback is welcomed through the comments. If you got scripting suggestions or questions, do not hesitate using the contact form.

You can download the script from the TechNet Gallery here.

Revision History

TechEd North America 2012 sessions

With the TechEd North America 2012 event still running, recordings and slide decks of finished sessions are becoming available online. Here’s an overview of the Exchange-related sessions:

TechEd North America 2011 sessions

With the end of TechEd NA 2011, so ends a week of interesting sessions. Here’s a quick overview of recorded Exchange-related sessions for your enjoyment:

VMWare HA/DRS and Exchange DAG support

Last year an (online) discussion took place between VMWare and Microsoft on the supportability of Exchange 2010 Database Availability Groups in combination with VMWare’s High Availability options. Start of this discussion were the Exchange 2010 on VMWare Best Practices Guide and Availability and Recovery Options documents published by VMWare. In the Options document, VMWare used VMware HA with DAG as an example and contains a small note on the support issue. In the Best Practices Guide, you have to turn to page 64 to read in a side note, “VMware does not currently support VMware VMotion or VMware DRS for Microsoft Cluster nodes; however, a cold migration is possible after the guest OS is shut down properly.” Much confusion rose; was Exchange 2010 DAG supported in combination with those VMWare options or not?

In a reaction, Microsoft clarified their support stance on the situation by this post on the Exchange Team blog. This post reads, “Microsoft does not support combining Exchange high availability (DAGs) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers.” This meant you were on your own when you performed fail/switch-overs in an Exchange 2010 DAG in combination with VMWare VMotion or DRS.

You might think VMWare would be more careful when publing these kinds of support statements. Well, to my surprise VMWare published a support article 1037959  this week on “Microsoft Clustering on VMware vSphere: Guidelines for Supported Configurations”. The support table states a “Yes” (i.e. is supported) for Exchange 2010 DAG in combination with VMWare HA and DRS. No word on the restrictions which apply to those combination, despite the reference to the Best Practices Guide. Only a footnote for HA, which refers to the ability to group guests together on a VMWare host.

I wonder how many people just look at that table, skip those guides (or overlook the small notes on the support issue) and think they will run a supported configuration.

W2008 R2 Hyper-V architecture poster

Since most of us working with Exchange have – directly or indirectly – to do with virtualization technologies, we would like to inform you Microsoft published a Windows Server 2008 R2 Hyper-V architecture poster, much like they did for Exchange 2007 (no sign of Exchange 2010 versions yet).

This poster – in PDF format – is a nice reference for key Windows Server 2008 R2 Hyper-V technologies. It focuses on subjects like networking, storage, live migration, snapshotting and management.

You can download the PDF here.

Exchange virtualization

A much asked question these days, is Exchange supported in a virtualized environment? As in many cases, the answer is “it depends”. Microsoft supports virtualized Exchange under the following conditions:

  • Hardware virtualization software used is one of the following products:
    • For Exchange 2007 or Exchange 2010: Windows Server 2008, Windows Server 2008 R2 with Hyper-V, Hyper-V Server 2008 or Hyper-V Server 2008 R2;
    • For Exchange 2003: Virtual Server 2005 R2;
    • In general, any validated 3rd party hypervisor (check for ESX, XenServer versions etc.). Click here for the list.
  • Exchange guest
    • Exchange 2003 SP2 (or later)
      • Virtual Machine Additions and Virtual Machine PCI SCSI driver required;
      • Exchange 2003 clusters aren’t supported;
    • Exchange 2007 SP1 (or later)
      • is running on Windows Server 2008;
      • support for Windows Server 2008 R2 expected with Exchange Server 2007 SP3 (no date announced yet);
      • No Unified Messaging role (due to real-time requirements by UM role);
      • Meets all of the Exchange 2007 SP1 specific requirements (here).
    • Exchange 2010
      • is running on Windows Server 2008 SP2 or Windows Server 2008 R2;
      • No Unified Messaging role (due to real-time requirements by UM role);
      • Meets all of the Exchange 2010 specific requirements (here);
      • HA features (DAG) aren’t supported in combination with virtualization clustering or high-available solutions.
    • Earlier versions of Exchange than Exchange 2003 are not supported

Connecting StorCenter to ESXi using iSCSI

Recently I got myself an Iomega IX2-200 StorCenter. It’s a nice little device which will do nicely for my lab. When playing aroung with the device I wanted to connect it to my ESXi 4 servers using ISCSI. Yes, I’m running VMWare ESXi, main reason for that being one of my BSD guests and Hyper-V doesn’t do BSD.

Below are the steps I used to utilize the StorCenter as an ESXi datastore. I’ll be using the ESXi iSCSI Software Adapter, use CHAP (could’nt get Mutual CHAP to work, anyone?) and assume networking has been properly configured. Also, in this example we’ll be using VMFS volumes for VMDK storage, not Raw Device Mappings.

Note that during taking the screenshots I discovered a 1Gb test iSCSI target was too small (ESXi complained in the Add Storage / Select Block Size dialog), so I upped it to 16 GB using the StorCenter dashboard.

First enable iSCSI on the StorCenter. In the dashboard, select the Settings tab, click iSCSI and check Enable iSCSI. Leave iSNS discovery unchecked as ESXi doesn’t support it. Leave the option Enable two-way authentication (Mutual CHAP) unchecked.

Next, I’m going to add an iSCSI target. Select the Shared Storage tab and click Add. Change Shared Storage Type to iSCSI Drive and give it a name, e.g. esxtest. Then specify the initial size, e.g. 16Gb (you can increase this size when required). Leave Enable security checked. Click Next. Leave all User Access set to None; I’ll do that in the next step. Click Apply

Because I’m going to use CHAP I need to create an account on the StorCenter for the iSCSI initiator (i.e. ESXi) to authenticate itself. Select tab Users and click Add. Specify a Username and a password. This password MUST be between 12-16 characters. Uncheck Administrator and Add a secured folder for this user. Click Next; when asked about Group memberships click Next again. Now specify which users have access to which folders and iSCSI drives. Check the Read/Write option for the users created earlier and click Next.

In VI client, select the host’s Configuration tab and select Storage Adapters.

Select the iSCSI Software Adapter, e.g. vmhba34, and click Properties. Click Configure and make sure iSCSI is enabled (enabling may require restart). Click OK to close this dialog. Now, before connecting to the iSCSI target I’m going to specify the credentials first. I’ll use global settings so new connections will inherit these settings by default. To start configuring authentication, in the iSCSI Initiator Properties dialog on the General tab, click CHAP.
In the CHAP Credentials dialog, set CHAP to Use Chap and specify the Name and Secret (i.e. password) of the user created on the StorCenter. Since I’m not using Mutual CHAP, leave that setting to Do not use CHAP. Click OK.

Now I’m going to connect to the iSCSI target. Being lazy, select the Dynamic Discovery tab and click Add. Specify the address of the StorCenter. Click OK when done. The iSCSI server you just specified will now be added to the list of Send Targets. Click Close; when asked about rescanning the HBA select Yes.The iSCSI target will now be listed in the View section.
In the VI client, select Storage and click Add Storage. Select the Disk/LUN Storage Type and click Next. Select the added iSCSI target. Next. Next. Specify the name of the Datastore. Next. Specify the block size and required capacity. Next.

When done, click Finish. Presto! One iSCSI VMFS datastore at your disposal.

Microsoft Virtualization Best Practices for Exchange Server (Level 300)

The TechNet Webcast: Microsoft Virtualization Best Practices for Exchange Server (Level 300)  might be of interest to you:

Virtualizing business critical applications delivers significant benefits, including cost savings, enhanced business continuity, and an agile and efficient management solution. In this webcast, we focus on virtualizing Microsoft Exchange Server using Microsoft solutions, we discuss the benefits of using Microsoft virtualization technologies instead of technologies from key competitors such as VMware, and we provide technical guidance and best practices for virtualizing Exchange Server in various production scenarios. We also discuss results from lab deployment tests.

You can register for the event here.