Exchange and potential Packet Loss on VMWare

Yesterday, I noticed a VMware knowledgebase article, updated on November 14th, which could be worth taking notice of when you’re running Exchange – or any other application – in a virtualized environment based on VMware technology.

VMware’s KB article 2039495 mentions that in VMware ESXi 4.x and 5.x, very high traffic bursts may cause the VMXnet3 driver to start dropping packets in the Guest OS. This has been observed on Windows Server 2008 R2 running Exchange 2010 with – as VMware puts it – a high number of Exchange users. What the article fails to mention is the configuration used by customers experiencing the issue. It might for example be valuable to know if a DAG was used, if the traffic (MAPI, replication) was split over multiple NICs or if it occurred with iSCSI storage. I won’t be surprised if the issue occurs with other high traffic situations as well, e.g. seeding. Luckily, Exchange is capable of handling certain hiccups so customers might not be even aware of the issue.

After some more digging I found another article, KB 1010071, which mentions a packet drop issue with VMware Guests known since ESX 3. This article explains a bit more why the issue occurs in the first place, being the network driver running out of receive buffers, causing the packets to be dropped between the Virtual Switch and the Guest OS driver.

One could argue about the impact of a few lost packets. However, as traffic increases the (potential) number of lost packets increases. Each lost packet results in retransmission of unacknowledged packets, which impacts overall throughput causing increased latencies.

VMware’s temporary solution to this problem is:

Open up the Windows guest;
Open the properties of the VMXNET3 NIC;
On the Advanced tab, increase the Small Rx Buffers or Rx Ring #1 Size;
What KB1010071 mentions and KB2039495 doesn’t, is that when using jumbo frames – not seldom used, e.g. replication – you might need to adjust the Rx Ring #2 size and Large Rx Buffers values.

Now I say temporary, because VMware’s solution of course isn’t a real solution; it’s only meant to – in their own words – reduce packet drops. Also, the KB1010071 article states you should “determine an appropriate setting by experimenting with different buffer sizes”. That doesn’t sound like an permanent, assuring solution for a virtualization environment running business critical applications now, does it?

All things considered, I’d recommend configuring these parameters to their maximum setting, preferably at installation time, unless anyone knows of a reason not to. In addition, this is another case for the best practice to split MAPI and replication traffic on Exchange using multiple NICs.

Finally, I already learnt of two other applications experiencing the issue. Therefor I think the problem is not Exchange 2010 specific, as KB2039495 might imply. If you have similar experiences, experienced differences between GbE and 10Ge, please use the comments to share.

12 thoughts on “Exchange and potential Packet Loss on VMWare”

Does this only apply to Exchange 2010 or also other Exchange versions?
As we are running Exchange 2007 SP3 in an VMware envoriment.

LikeLike

Reply ↓

mdrooij on December 7, 2012 at 8:20 AM said:

One KB specifically mentions Exchange 2010, the other is generic. But if Exchange 2010 can, any application can, other Exchange versions inclusive.

LikeLike

Reply ↓

Pingback: NeWay Technologies – Weekly Newsletter #20 – December 6, 2012NeWay | NeWay

Im running Exchange 2013 in a standalone environment with just one Mailbox and CAS .. The mailbox has two NICs – for smtp relay & other for normal traffic. Should I be changing on both NIC? I have seen high volume of packet drops.

LikeLike

Reply ↓

Michel de Rooij on September 23, 2014 at 8:54 PM said:

This applies to VMWare guests running Exchange only. If so, the configuration change needs to be made to each virtual NIC instance (i.e. guest NIC).

LikeLike

Reply ↓

I’ve got a call open with VMware for this issue with a client at the moment. Upping the buffer settings to the max has reduced the incidence of the issue but not removed it altogether. It’s 2010 CAS servers on VMware 5 – issue only happens first thing on a Monday morning interestingy enough. Causes the users to get disconnected from Outlook until the issue goes away or the servers are rebooted

LikeLike

Reply ↓

Samy on October 10, 2016 at 7:02 PM said:

Have you ever got this figured? we are facing similar issue on every monday

LikeLike

Reply ↓
- Michel de Rooij on October 10, 2016 at 7:47 PM said:
  
  Since configuring Receivebuffers etc. and with recent VMW toolset (drivers), no issues. Also have disabled the VShield driver, as that introduced latency on SSL level (naggle-like behavior)
  
  LikeLike
  
  Reply ↓
- ngv on October 10, 2016 at 8:26 PM said:
  
  We ended up replacing NLB on our CAS servers with a F5 hardware load balancer and it resolved all the Monday morning problems. We have a feeling that slow latency at the datastore level was the root issue but could never prove it
  
  LikeLike
  
  Reply ↓

Hi!

I have the same issue with packets being dropped, retrans, dup acks and out of order. This is driving me nuts. We are running ESXI 5.5 with latest service pack. We have SQL talking to a few application servers HP SIM and Networking monitoring tool. We see the application disconnecting all the time from the SQL server. I am going to try to up the buffer sizes. BTW we took at the network and we don’t see any crc errors or anything out of the ordirany. Any advice would be greatly appreciated.

LikeLike

Reply ↓

Pingback: Exchange and VMWare Guest Introspection | EighTwOne (821)

We are in 2023 and the problem is continued.
Exchange 2016 Cu 23 and Vmware ESXI 7.0.

When i execute netstat -e i see this: Packets Discards

Bytes 1505628558 765237319
Unicast packets 2340356381 1351804237
Non-unicast packets 200128194 240242
Discards 3552 0
Errors 0 188380
Unknown protocols 0

LikeLike

Reply ↓

Justin on December 7, 2012 at 7:40 AM said:

Does this only apply to Exchange 2010 or also other Exchange versions?
As we are running Exchange 2007 SP3 in an VMware envoriment.

LikeLike

Reply ↓
- mdrooij on December 7, 2012 at 8:20 AM said:
  
  One KB specifically mentions Exchange 2010, the other is generic. But if Exchange 2010 can, any application can, other Exchange versions inclusive.
  
  LikeLike
  
  Reply ↓
Pingback: NeWay Technologies – Weekly Newsletter #20 – December 6, 2012NeWay | NeWay
Jovin on September 23, 2014 at 5:24 PM said:

Im running Exchange 2013 in a standalone environment with just one Mailbox and CAS .. The mailbox has two NICs – for smtp relay & other for normal traffic. Should I be changing on both NIC? I have seen high volume of packet drops.

LikeLike

Reply ↓
- Michel de Rooij on September 23, 2014 at 8:54 PM said:
  
  This applies to VMWare guests running Exchange only. If so, the configuration change needs to be made to each virtual NIC instance (i.e. guest NIC).
  
  LikeLike
  
  Reply ↓
ngv on October 28, 2014 at 5:56 PM said:

I’ve got a call open with VMware for this issue with a client at the moment. Upping the buffer settings to the max has reduced the incidence of the issue but not removed it altogether. It’s 2010 CAS servers on VMware 5 – issue only happens first thing on a Monday morning interestingy enough. Causes the users to get disconnected from Outlook until the issue goes away or the servers are rebooted

LikeLike

Reply ↓
- Samy on October 10, 2016 at 7:02 PM said:
  
  Have you ever got this figured? we are facing similar issue on every monday
  
  LikeLike
  
  Reply ↓
  - Michel de Rooij on October 10, 2016 at 7:47 PM said:
    
    Since configuring Receivebuffers etc. and with recent VMW toolset (drivers), no issues. Also have disabled the VShield driver, as that introduced latency on SSL level (naggle-like behavior)
    
    LikeLike
    
    Reply ↓
  - ngv on October 10, 2016 at 8:26 PM said:
    
    We ended up replacing NLB on our CAS servers with a F5 hardware load balancer and it resolved all the Monday morning problems. We have a feeling that slow latency at the datastore level was the root issue but could never prove it
    
    LikeLike
    
    Reply ↓
Joey on July 8, 2015 at 4:00 AM said:

Hi!

I have the same issue with packets being dropped, retrans, dup acks and out of order. This is driving me nuts. We are running ESXI 5.5 with latest service pack. We have SQL talking to a few application servers HP SIM and Networking monitoring tool. We see the application disconnecting all the time from the SQL server. I am going to try to up the buffer sizes. BTW we took at the network and we don’t see any crc errors or anything out of the ordirany. Any advice would be greatly appreciated.

LikeLike

Reply ↓
Pingback: Exchange and VMWare Guest Introspection | EighTwOne (821)
Diego Silva on August 31, 2023 at 11:46 AM said:

We are in 2023 and the problem is continued.
Exchange 2016 Cu 23 and Vmware ESXI 7.0.

When i execute netstat -e i see this: Packets Discards

Bytes 1505628558 765237319
Unicast packets 2340356381 1351804237
Non-unicast packets 200128194 240242
Discards 3552 0
Errors 0 188380
Unknown protocols 0

LikeLike

Reply ↓

EighTwOne (821)

Practical Tips and Info on Exchange, Microsoft 365 and PowerShell

Exchange and potential Packet Loss on VMWare

12 thoughts on “Exchange and potential Packet Loss on VMWare”

Leave a comment Cancel reply

Rate this:

Share this:

Related

12 thoughts on “Exchange and potential Packet Loss on VMWare”

Leave a comment Cancel reply