Outlook Connection Status Details

Posted on May 6, 2019 by Michel de Rooij

Outlook 2016 New Icon A little notice on a potential helpful feature which was introduced to Outlook at some point, but I wasn’t aware of before (or it’s just new). At least the option is available in Outlook v1905 build 11629.20008 C2R; it might also be available in standalone.

Many people are familiar with the Outlook Connection Status window, which you can summon by right-clicking the Outlook icon in the system tray while holding CTRL. This will show a dialog containing the connections Outlook is managing for every configured account, together with valuable information like endpoint, response times, etc.

One of the columns, Req/Fail, is showing the number of Requests and Failed requests. To check the headers of the last failing response for a particular connection, double-click the Req/Fail number. This will open up a popup window similar to this one:

Apart from essentials like the http result code, it will show which front-end and back-end servers processed the request. This might help to quickly determine if clients are connecting to unfavorable public endpoints, or when failed requests are coming from specific in case of Exchange on-premises. Of course, this information can also be retrieved using additional tools like Fiddler, but with this shortcut you don’t need to install additional software, as well as that you can ask end users to open up this window and send you the information.

Again, another little gem which might come in handy when troubleshooting.

Exchange and VMWare Guest Introspection

Posted on February 10, 2017 by Michel de Rooij

In this long overdue article, I would like to share an experience, where a customer was upgrading from Exchange 2010 to Exchange 2013. Note that this could also apply to customers migrating from Exchange 2007 or migrating to Exchange 2016 as well. The Exchange 2013 servers were hosted on VMWare vSphere 5.5U2; the Exchange 2010 servers on a previous product level.

The customer saw a negative impact on the end user experience of Outlook 2010 users, especially those working in Online Mode. Other web-based services like Exchange Web Services (EWS) were affected as well. The OWA experience was good.

Symptoms
After migrating end user mailboxes from Exchange 2010 to Exchange 2013 (but as indicated, this applies to Exchange 2016 as well), end users reported delays in their Outlook client responses, where sometimes Outlook seemed to ‘hang’ when performing certain actions like accessing a Shared Mailbox. Also, when opening up the meeting planner in order to schedule a room using Scheduling Assistant, it could take a significant amount of time, (i.e. minutes) before the schedule of all the rooms was being displayed.

The end users’ primary mailbox was configured to use Cached Mode, except for VDI users who used their primary mailbox in Online Mode. Shared Mailboxes were used in Online Mode due to the size (Outlook 2010, so no slider).

Analysis
First, the overall health of the Exchange environment was checked to exclude it as a potential cause. Exchange performance metrics were monitored, as well as Managed Availability status and events, logs like the RCA logs, and VMWare CPU Ready % to check for potential vCPU allocation issues (read: oversubscription). None of these metrics caused any reason for concern.

After reconfiguring the HOSTS file, in order to bypass the load balancer and direct traffic to a single Exchange server to simplify troubleshooting, the symptoms remained. Then, we checked:

TCP/IP optimization settings, e.g. RSS, Chimney, etc.
VMWare VMXNet3 offloading, e.g. Large Send Offload, TCP Checksum Offloading
VMWare VMXNet3 buffer settings

All those settings were also found to be on their recommended values.

We started digging in from the client’s perspective, and used WireShark to see what was going on on the wire. After filtering on the Exchange host, we saw the following pattern:

Note that this customer used SSL Offloading, so mailbox access took place on port 80 instead of 443 (RPC/http).

As you might notice, there is a consistent 200ms delay after the client receives its response (e.g. packets 106 and 110). When searching around for ‘200ms’ and ‘delay’, you may end up with articles describing the effect of the Nagle algorithm (Delayed ACK). Nagle is meant to reduce chatter on the wire, but can have a negative effect on near real-time communications, especially with small packets. Also, while 200ms might seem small, looking at the number of packets exchanged between Outlook and Exchange, this can add up quite quickly. Most of these articles will also describe a fix, recommending to configure a registry key TcpAckFrequency, and set it to 1 (default is 2). For testing purposes, we configured this key and after the mandatory reboot, the end user Outlook experience was snappy. However, setting this key would impact all client communications (real as well as VDI clients); not a recommended long-term solution due to side effects on the network.

After removing the registry key, investigating was continued. Since there was no issue with the Exchange 2010, we started to suspect there was perhaps an issue with VMWare, or there was some form of network optimization or packet inspection going on. This, due to the fact there was no problem with the old Exchange environment, and the elements that changed when migrating were VMWare vSphere version, physical vSphere hosts, and last but not least, the protocol switched. This client didn’t use Outlook Anywhere, so RPC/http was not enabled for Exchange 2010 prior to migration, and clients connected using MAPI. After some more investigating, some potentially related articles on the VMWare knowledgebase were found, talking about latency issues in certain VMWare Tools versions, the VMWare guest driver set, and downgrading these to 5.1 would have the same effect as configuring TcpAckFrequency. Unfortunately, this wasn’t an option as the hardware level of the VMWare guests already was on a certain level.

Remediation
When installing VMWare Tools, the package comes with some system-level drivers which handle communications between the guest and the host or other guests. One of these drivers is the VMWare Guest Introspection driver (or VMCI Drivers, and formerly VShield Drivers). This component can be identified in the guest in the presence of the system drivers vnetflt and vsepflt, and accommodates agentless antivirus solutions like McAfee MOVE. However, it seems to also interfere with certain workloads in their driver ecosystem, thus negatively impacting real-time communications. I wasn’t able to test if the change from MAPI to RPC/http (or later MAPIhttp) also contributed to this effect, as the Introspection driver may not scan MAPI RPC packets at all, in which case there is no overhead introduced.

Needless to say disabling the Guest Introspection component might be less desirable for some organizations, and in those cases, when you experience this issue, I suggest contacting your VMWare representative, after verifying your VMWare Tools are part of the list of recommended versions.

In the end, in this situation Guest Introspection was disabled and a file-level scanner was introduced (with the required exclusions, of course). Performance for Online Mode was optimal when accessing Online Mode mailboxes, and using Exchange web services like Scheduling Assistant showed room planning in seconds rather than minutes.

Note that unfortunately, recent versions of VSphere running Exchange virtualized workloads also have this issue. On the plus side, they allow for separate (de)installation of the file system driver (NSX File Introspection Driver) and the network driver (NSX Network Introspection Driver). I am pretty sure removing the network driver would suffice, which might be a viable solution for some folks as well.

If you have any insights to share, please leave them in the comments.

Public Folder Hierarchy and Client Access

Posted on August 8, 2016 by Michel de Rooij

When investigating performance issues of a multi-node, multi-role Exchange 2013 server deployment, I found the CPU utilization of a single Exchange 2013 server constantly above the load of the rest.

When checking the Processor Utilization % for all Exchange servers using Performance Monitor, the daily trend image looked like this:

As you can clearly see, one single server is constantly experiencing more load than the other servers. It is also above the 80% mark, causing all sorts of potential side-effects if Managed Availability would kick in.

When checking the processes on that server, the major CPU load was generated by the Microsoft.Exchange.RPCClientAccess.service as well as the related w3svc# process. The load balancer performed a near even distribution of client connections over these servers. You can use the Exchange Performance Health Checker script with the LoadBalancingReport switch to verify this.

Next, we checked if there was an overactive mailbox on that particular server. For that purpose, we ran the following cmdlet in the Exchange Management Shell, which showed us the Public Folder mailbox was very active:

Get-StoreUsageStatistics –Server <ExchangeServer> | ? {$_.DigestCategory –eq ‘timeInServer’} | Sort TimeOnServer –Descending

Note: More on tracking overactive mailboxes using Get-StoreUsageStatistics in this excellent write-up by Andrew HigginBotham.

Another clue was provided through the PublicFolders Healthset, which was picked up by System Center Operations Manager as well:

The PublicFolders Health Set has detected a problem with PublicFolderMailbox.ConnectionCount at 10-7-2016 06:12:22. 0 failures were found. The Health Manager is reporting that The total number of hierarchy connections for public folder mailbox PFMailbox1 has reached 2001. Consider creating a new public folder mailbox for load balancing hierarchy accesses.

Apparently, there were more than 2,000 connections being made to the PFMailbox1 Public Folder mailbox. This was odd, as there were multiple Public Folder mailboxes created with hierarchy. Users are expected to be automatically distributed over these mailboxes, falling within the 2,000 concurrent logons limit as mentioned here. Note that this limit applies to public folder mailboxes serving hierarchy as well; even if clients don’t access Public Folders, they still will connect to these Public Folder mailboxes in order to obtain hierarchy information.

Next thing we checked was to which default Public Folder mailbox mailboxes were configured to connect. To accomplish this we can inspect the mailbox property DefaultPublicFolderMailbox:

Get-Mailbox –ResultSize Unlimited | Group-Object DefaultPublicFolderMailbox –NoElement

Count Name
----- ----
10139 contoso.com/Accounts/Users/PFMailbox1

Apparently all mailboxes were automatically set to connect to a single Public Folder mailbox. Then maybe something was preventing the other Public Folders from serving hierarchy:

Get-Mailbox –PublicFolder | Select Name,*Hierarchy*

Name       IsExcludedFromServingHierarchy IsHierarchyReady
----       ------------------------------ ----------------
PFMailbox1 False                          True
PFMailbox2 False                          False
PFMailbox3 False                          False
PFMailbox4 False                          False

IsExcludedFromServingHierarchy was False for all 4 servers, which indicates they are not blocked from serving hierarchy. However, the hierarchy was not ‘ready’ for 3 of them. This could be due to the hierarchy being out of date or not being created at all.

The output of (Get-PublicFolderMailboxDiagnostics PFMailbox2 -IncludeHierarchyInfo).SyncInfo indeed indicated there were problems synchronizing contents from the PFMailbox1 mailbox. We then ran the following cmdlet to trigger updating synchronizing the hierarchy again:

Update-PublicFolderMailbox –InvokeSynchronizer –Identity PFMailbox2

The Get-Mailbox –Identity PFMailbox2 –PublicFolder | Select Name,*Hierarchy* now showed IsHierarchyReady was True. We ran the same cmdlet for the other two Public Folder mailboxes as well.

After a while, we verified the effect on the assignment of DefaultPublicFolderMailbox on the mailboxes:

Get-Mailbox –ResultSize Unlimited | Group DefaultPublicFolderMailbox –NoElement

Count Name
----- ----
2601  contoso.com/Accounts/Users/PFMBPFMailbox2
2309  contoso.com/Accounts/Users/PFMBPFMailbox4
2632  contoso.com/Accounts/Users/PFMBPFMailbox1
2597  contoso.com/Accounts/Users/PFMBPFMailbox3

Public folder assignments were now (more or less) equally distributed over the 4 Public Folder mailboxes, and life was good.

We also verified Public Folder access distribution by querying the Exchange RpcClientAccess log files. An excellent tool to aid in this task is LogParser with LogParser Studio. We configured LogParser Studio to query log files at ‘<Installation folder>\Logging\RPC Client Access’ on the Exchange servers. The query used, grouped all entries per date, operation (in this case we are only interested in PublicLogon), and part of the field ‘operation-specific’; more exactly, the legacyDN part which tells which (Public Folder) mailbox was accessed:

SELECT EXTRACT_PREFIX([#Fields: date-time], 0, ‘T’) As Date, Count (*) as Total, [Operation],
EXTRACT_PREFIX(EXTRACT_SUFFIX([operation-specific], 0, ‘cn=’), 0, ‘ in database ‘) as PFMailbox
FROM ‘[LOGFILEPATH]’
WHERE [operation]=’PublicLogon’
AND [failures] IS NULL
GROUP BY Date, [Operation], PFMailbox
ORDER BY Date ASC

The output showed all Public Folder mailboxes were now accessed by clients, and logons to the Public Folder mailboxes were now (more or less) equally distributed:

Exchange 2013 KB articles RSS feed

Posted on May 14, 2013 by Michel de Rooij

Like most people I still use RSS feeds to keep track of news and updates from various sources. But not everyone is aware you can keep track of new or updated Microsoft’s knowledgebase articles using RSS feeds, sometimes categorized per product. I already blogged about the availability these feeds about 2,5 years ago.

Now with all the releases since then, it’s time to update this information with current products, especially with the feed for Exchange 2013 related articles becoming available recently:

For a complete list of the knowledgebase articles RSS feeds check here.

The case of the not updating Outlook for Mac 2011

Posted on January 26, 2013 by Michel de Rooij

I had contact with a Twitter user on an issue with Outlook for Mac 2011 talking against Exchange Server 2007 on Small Business Server 2008.

When configuring a new account, Outlook for Mac reported “Account cannot be added. Note that Outlook 2011 requires Exchange Server 2007 SP1 Update Rollup 4 or later.”

However, that couldn’t be right because that user claimed to be running a higher version of Exchange 2007. After manually entering the server name, a connection could be established and an initial download of folders and contents took place. However, items weren’t updated and contacts and calendar remained empty.

After trying and checking some things, I asked to turn on Outlook for Mac’s logging hoping to find something in the Exchange Web Services log (Outlook for Mac 2011 is EWS based). You can enable logging by checking Window > Error Log > Errors > Settings > Turn on logging for troubleshooting. After a while I was sent the log file Microsoft Outlook_Troubleshooting_0.log which contained the following excerpt:

2013-01-24 08:55:34.392,0xFFFFFFFF,Outlook Exchange Web Services,Info,"EWS: Response data received on thread=0x7d27bdb4, XML data=
<?xml version=""1.0"" encoding=""utf-8""?><soap:Envelope xmlns:soap=""http://schemas.xmlsoap.org/soap/envelope/"" xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema""><soap:Header><t:ServerVersionInfo MajorVersion=""8"" MinorVersion=""3"" MajorBuildNumber=""297"" MinorBuildNumber=""0"" Version=""Exchange2007_SP1"" xmlns:t=""http://schemas.microsoft.com/exchange/services/2006/types"" /></soap:Header>

First, Exchange reports version 8.3.297.0 which corresponds with Exchange 2007 SP3 RU9 (EWS can report slightly different version than actual version), so something else was wrong while that’s well above Exchange 2007 SP1 RU4.

2013-01-24 08:55:39.355,0xFFFFFFFF,Outlook Exchange Web Services,Info,"EWS: Sending request on connection=0x7dc89be8, URL=/EWS/Exchange.asmx, SoapAction=""http://schemas.microsoft.com/exchange/services/2006/messages/SyncFolderItems"""
2013-01-24 08:55:39.358,0xFFFFFFFF,Outlook Exchange Web Services,Info,EWS: Received response on connection=0x7d31dae8; status=500
..
2013-01-24 08:55:49.861,0xFFFFFFFF,Outlook Exchange Web Services,Info,"EWS: Sending request on connection=0x7d71a648, URL=/EWS/Exchange.asmx, SoapAction=""http://schemas.microsoft.com/exchange/services/2006/messages/GetItem"""	
2013-01-24 08:55:49.863,0xFFFFFFFF,Outlook Exchange Web Services,Info,EWS: Received response on connection=0x7dc26638; status=500
..
2013-01-24 08:55:39.359,0xFFFFFFFF,Outlook Exchange Web Services,Info,"EWS: Sending request on connection=0x7d31dae8, URL=/EWS/Exchange.asmx, SoapAction=""http://schemas.microsoft.com/exchange/services/2006/messages/GetItem"""	
2013-01-24 08:55:39.477,0xFFFFFFFF,Outlook Exchange Web Services,Info,EWS: Received response on connection=0x7d7005c8; status=200

I then noticed various EWS requests returned http status code 200 (means OK) but also 500’s, which correspond to “Internal Server Error”. It happened after various requests (e.g. SyncFolderItems, GetFolder, GetItem) but not for all requests.

Now, code 500 isn’t very helpful (general terminal failure) and a quick restart of IIS with iisreset /restart /noforce didn’t solve things.

After some digging it turned out the seemingly unrelated KB2264110 pointed in the right direction. I say unrelated, because it’s on messages not being updated on Blackberry Internet Service (BIS) after installing Exchange Server 2007 SP2. Turned out the performance counters on the Exchange 2007 server were corrupt and rebuilding them solved the issue.

To rebuild the performance libraries, perform the following steps from an elevated command prompt: