Visio of Exchange 2010 SP1 Network Ports Diagram v0.31

By popular demand and since many of you requested this: I’ve put the Visio file of the Exchange 2010 SP1 Network Ports Diagram online. The original post in PDF format is of April 5th.

If you got any comments or additions worth sharing, do not hesitate to write ’em down in the comments or send me an e-mail. When used, crediting or a reference is appreciated.

The Visio document can be downloaded from here.

TechEd North America 2011 sessions

With the end of TechEd NA 2011, so ends a week of interesting sessions. Here’s a quick overview of recorded Exchange-related sessions for your enjoyment:

Thoughts on “Five things that annoy me about Exchange 2010”

An article on SearchExchange by Greg Shields, MVP Remote Desktop Services, VMWare vExpert and a well-known writer and speaker, covered Greg’s annoyances with Exchange Server 2010. I doubt these points will be in the top 5 annoyances of the common Exchange 2010 administrator, apart from some arguments being flawed. Here’s why.

Role Bases Access Control Management
First, the article complains about the complexity of Exchange 2010’s Role Based Access Control system, or RBAC for short. It’s clear what RBACs purpose is, managing the security of your Exchange environment using elements like roles, groups, scopes and memberships (for more details, check out one of my earlier posts on RBAC here). RBAC was introduced with Exchange 2010 to provide organizations more granular control when compared to earlier versions of Exchange, where you had manage security using a (limited) set of groups and Active Directory. In smaller organizations the default setup may – with a little modification here and there – suffice.

For other, larger organizations this will enable them to fine-tune the security model to their business demands. And yes, this may get very complex.

Is this annoying? Is it bad that it can’t be managed from the GUI in all its facets? I think not for two reasons. First is that Exchange administrators should familiarize themselves with PowerShell anyway; it is here to stay and the scripting language of choice for recent Microsoft product releases. Second is that, in most cases, setting RBAC up will be a one-time exercise. Thinking it through and setting it up properly is just one of the aspects of configuring the Exchange 2010 environment. Also, I’ve seen pre-Exchange 2010 organizations with delegated permission models that also took a significant amount of time to fully comprehend, beating the authors “20 minute test” easily.

DAGs three server minimum
The article talks about the requirement to put the 3rd copy in a Database Availability Group (DAG) on a “witness server” in a separate site to get the best level of high availability. It looks like the author mixed things up as there’s no such thing as a “witness server” nor a requirement to host anything as such in a separate site. In DAG, there’s a witness share (File Share Witness or FSW for short) which is used to determine majority for DAG configurations with an even number of members. This share is hosted on a member server, preferably as part of the e-mail infrastructure, e.g. a Hub Transport server, located in the same site as the (largest part of) population resides.

Also, there’s no requirement for a 3rd DAG member when high availability is required. The 3rd DAG member is a Microsoft recommendation when using JBOD storage. For disaster recovery a remote 3rd DAG member could make sense, but then you wouldn’t require a “witness server” given the odd number of DAG members.

Note is that there’s a difference between high availability and disaster recovery. Having multiple copies in the same site is to offer high availability; having remote copies is to provide resilience.  More information on DAGs and the role of the File Share Witness in my Datacenter Activation Coordination article here.

Server Virtualization and DAGs
Next, the article continues that Exchange 2010 DAGs don’t support high availability options provided by the virtualization platform, which is spot on. Microsoft and VMWare have been squabbling for some time over DAGs in combination of with VMWare’s HA/DRS options, leading to the mentioned support statement from Microsoft.  VMWare did their part by putting statements like “VMware does not currently support VMware VMotion or VMware DRS for Microsoft Cluster nodes“; what doesn’t help is putting this in the best practices guide as a side note on page 64. More recently, VMWare published a support table for VMWare HA/DRS and Exchange 2010 indicating a “YES” for VMWare HA/DRS in combination with Exchange 2010 DAG. I hope that was a mistake.

In the end, I doubt if DAGs being non-supported in conjunction with VMWare HA/DRS (or similar products from other vendors) will be a potential deal breaker, like the author states. That might be true for organizations already utilizing those options as part of their strategy. In that case it would come down to evaluating running Exchange DAGs without those options (which it happily will). Not only will that offer organizations Exchange’s availability and resilience options with a much greater flexibility and function set than a non-application aware virtualization platform would, it also saves you some bucks in the process as well. For example, where VMWare can recover from data center or server failures, DAGs can also recover from database failures and several forms of corruption.

Exchange 2010 routing
The article then continues with Exchange 2010 following Active Directory sites for routing. While this is true, this isn’t something new. With the arrival Exchange 2007, routing groups and routing group connectors were traded for AD sites to manage routing of messages.

The writers annoyance here is that Exchange must be organized to follow AD site structure. Is that bad? I think not. Of course, with Exchange 2003 organizations could skip defining AD sites so they should (re)think their site structure anyway since more and more products use AD site information. I also think organizations that haven’t designed an appropriate AD site structure following recommendations may have issues bigger than Exchange. In addition, other products like System Center also rely on a prope AD site design.

Also, when required organizations can control message flow in Exchange using hub sites or connector scoping for instance. It is also possible to override site link costs for Exchange. While not all organizations will utilize these settings, they will address most needs for organizations. Also, by being site-aware, Exchange 2010 can offer functionality not found in Exchange 2003, e.g. autodiscover or CAS server/CAS Arrays having site-affinity.

Ultimately, it is possible to set up a separate Exchange forest. By using a separate forest with a different site structure, organizations can isolate directory and Exchange traffic to route it through different channels.

CAS High Availability complicated?
The last annoyance mentioned in the article is about the lack of wizards to configure CAS HA features, e.g. to configure a CAS array with network load balancing like the DAG wizard installs and configures fail-over clustering for you. While true, I don’t see this as an issue. While setting up NLB was not too complex and fit for small businesses, nowadays Microsoft recommends using a hardware load balancer, making NLB of less importance. And while wizards are nice, most steps should be performed as part of a (semi)automated procedure, e.g. reconfiguring after a fail/switch-over. This procedure or script can be tested properly, making it less prone to error.

The article also finds network load balancing and Windows fail-over clustering being mutually exclusive an annoyance. Given that hardware load balancing is recommended and cost effective, supported appliances became available this restriction is becoming a non- issue.

More information on configuring CAS arrays here and details on NLB with clustering here.

Final words
Now don’t get the impression I want to condemn Greg for sharing his annoyances with us. But when reading the article I couldn’t resist responding on some inaccuracies, sharing my views in the process. Most important is that we learn from each other while discussing our perspectives and views on the matter. Having said that, you’re invited to comment or share your opinions in the comments below.

Exchange 2010 SP1 Network Ports Diagram v0.31

It took a while, but I’ve updated the Exchange 2010 SP1 Network Ports diagram I first published in December. Note that the updated version is based on SP1, which you can find in the way to change the address book service for example.

For this version, I’ve included clients, 3rd party SMTP elements, UM and OCS/Lync components and a small list of how to change ports or fix dynamic port settings.

You can download the diagram here. When you got feedback, use the comments or send me an e-mail. Otherwise, feel free the use it; crediting or a reference is appreciated.

Update: Small correction, 135/TPC RPC endpoint mapper from Outlook to Client Access Server was missing (Thanks Maarten).

Update (13Aug11): The Visio can be downloaded through here.

Exchange 2010 Network Ports

When looking for which ports Exchange 2010 uses, you probably already read the excellent Exchange 2010 Network Port Reference TechNet article located here.

However, at some point, like when discussing things with the network/firewall people or documenting your design, it might be required to visualize things. For this purpose I created the following diagram:

image

Note that it’s a v1 and not all things are included, e.g.

  • Internal clients, e.g. Outlook, OWA;
  • UM connections to PBX;
  • Client Access Server connections to OCS;
  • For Hub-Hub, CAS-CAS or Edge-Edge, I’ve included a 2nd Hub/CAS/Edge server only mentioning ports used for Hub-Hub, CAS-CAS and Edge-Edge communications.

Also, all ports are left at their default values and the diagram doesn’t reflect the fact that you can fix certain ports like the one for the DAG or the MAPI RPC port; that might be added in a later version.

When you got feedback, fill in a comment. Otherwise, feel free the use it; crediting or a reference would be nice.

Updated Exchange 2010 Architecture Poster

Microsoft updated the Exchange 2010 Architecture poster yesterday. No major changes, only a small correction as far as I can tell (Edge Transport Server Role SMTP Send and Receive connectors were mislabeled).

You can download the poster here.

LUN design and Hardware VSS

I had a question why you need to design seperate LUNs for Exchange database and log files when using a hardware based Volume Shadow Copy Service (VSS) backup solution, as mentioned in this TechNet article:

To deploy a LUN architecture that only uses a single LUN per database, you must have a database availability group (DAG) that has two or more copies, and not be using a hardware-based Volume Shadow Copy Service (VSS) solution.

The reason for this requirement is that hardware VSS solutions operate at the hardware level, i.e. the complete LUN. Therefor, if you put the Exchange database and log files on a single LUN, it will always create a snapshot of the whole LUN. This restricts your recovery options, since you can by definition only restore that complete LUN, overwriting log files created after taking the snapshot. So, changes (log files) made after the snapshot are lost and you have no point-in-time recovery options.

For example, with the database and log files on a single LUN, suppose you create a full backup on Saturday 6:00. Then, disaster strikes on Monday. By definition, you can now only restore the database and log files as they were on Saturday 6:00; log files which were created after Saturday 6:00 are lost.

With the database and log files on separate LUNs, you can restore the database LUN, which leaves the LUN with the log files intact. Then, after restoring the database, you can start replaying log files.

So, keep this in mind when planning your Exchange LUNs in conjunction with the backup solution to be used. Note that the Mailbox Role Calculator supports this decision by letting you specify Hardware or Software VSS Backup/Restore as the Backup Methodology to be used.

If you’re interested in more background information on how VSS works, I suggest you check out this TechNet article.

Note: This blog has also been published on Exchange fellow Jaap Wesselius’ ExchangeLabs blog here.

Exchange 2010 Visio Stencil

A quick post, but helpful for all those involved with designing and drawing out infrastructure diagrams related to Microsoft Exchange Server. Microsoft published Visio stencil with shapes for Microsoft Exchange Server 2010 (RTM and SP1). These shapes include icons for the following:

  • Exchange 2010 server roles
  • Features new to Exchange 2010 SP1
  • Networking, telephony, and Unified Messaging objects
  • Active Directory and directory service objects
  • Client computers and devices
  • Other Exchange organization elements
  • Cloud elements

You can download the stencil, which is suited for Visio 2007 as well as Visio 2010, here; it has also been added to the toolkit page.

Exchange Server 2010 Architecture poster

Finally, the long awaited Exchange Server 2010 Architecture Poster is here!

This is similar to the Exchange 2007 Component Architecture poster and contains the architecture highlights and feature set of Microsoft Exchange Server 2010. This architecture poster is additional to the already published Microsoft Exchange Server 2010 Transport Server Role Architecture Diagrams which you could already get here.

You can download the Microsoft Exchange Server 2010 Architecture poster here.

A Decade in High Availability

A recent post from Elden Christensen, Sr. Program Manager Lead for Clustering & High Availability, reminded me of one of my former employers. When I joined that company back in 2000 for starting up a professional services based on Windows Server 2000 Data Center Edition, the company was already an established professional services provider in the business critical computing niche market, e.g. Tandem/Compaq/HP NonStop systems, mostly used in the financial markets, e.g. banks or stock exchanges. The Windows platform was regarded as inferior at that time by the NonStop folks and they had good arguments back then.

Remember, those were also the early days where no one was surprised to see an occasional blue screen (people were also using Windows 9x) and what we now know as virtualization was already happening on mainframes in the form of partitioning. At that time, Microsoft with their Windows Server platform had ambitions to enter the data center environment, where the NonStop platform was an established platform for ages and professionals had developed best practices for those environments.

Another part of the discussion was the Fault-Tolerance  versus High Availability topic, where NonStop was already an established Fault Tolerant solution for business critical environments, Windows still had only ambitions to move towards that market with the Data Center product. A logical move, looking at the status of (web) applications, SQL and last but not least, Exchange and where it was going and what customers expected of those products regarding availability and reliability. To repeat an infamous quote of a NonStop colleague back then, “E-mail is not business critical”. But that was almost 10 years ago, things have changed .. or haven’t they?

Single Point of Failure
First I’ll start by introducing the availability concept, which revolves around eliminating the single point of failure. This is an element in the whole system of hardware, software and organization that can cause downtime for a system, i.e. disruption of services. After identifying a single point of failure, we want to eliminate it to prevent downtime which is, after all, the ultimate goal for a business critical system. We can approach this task using two different strategies, Fault Tolerant (FT) or High Availability (HA). The task of identifying and eliminating single points of failure is an ongoing process, as most IT environments are subject to change over time.

Availability
To understand the Fault Tolerant and High Availability strategies we need to define the term “Availability”. In the dictionary, availability is defined as the quality or state of being available or an available person or thing, where in both cases available means present or ready for immediate use. The availability is mostly expressed as a percentage, for example when used in a service level agreement, but what does that percentage mean? To explain this take a look at the following diagram:

Lifecycle

I assume this lifecycle speaks for itself. Using this diagram, the availability is calculated as follows: MTBF / (MTBF + MTTR). The related expected downtime is calculated as ( 1 – Availability% ) * 1 year. Note that the time between failure and recovery isn’t used in the calculation.

I’ll use a simple example, a 500 GB Seagate Barracuda 7200.12 (ST3500412AS) with a MTBF specification of 750,000 hours. You have a 24 hours replacement contract and need about 4 hours to restore the backup. The availability would then be 750,000 / ( 750,000 + 28 ) = 0.9999626680% resulting in a yearly downtime of ( 1 – 0.9999626680) * ( 365 days  * 24 hours * 60 minutes ) = 19,6 min.

Of course with hardware these numbers are theoretical and to some extent a marketing thing; how else can Seagate specify an MTBF of 750,000 hours ( 85 years ). I tend to look at it as an indication of the reliability you can expect. For example, compare the MTBF of 7200.12 drive with an enterprise class drive, Seagate’s ES product line. The ST3500320NS has an MTBF of 1,200,000 hours.

That’s the reason you should use enterprise class drives in your storage solution instead of desktop drives, which aren’t supposed to run in 24×7 environments. To add to that, the MTBF decreases when used in series (RAID 0 = 1 / (1/MTBF1 + .. 1/MTBFn)) or increases when used in parallel (RAID 1 = MTBF * ( 1 + 1/2 + 1/n) ) configurations. When trying to do calculations for the whole supply chain, with all the elements and their individual specifications and support contracts, this can get very complex.

The 9’s
imageWhen talking about availability this is often shown using a series of 9’s, e.g. 99.9%. The more 9’s it has, the better (less downtime). Note that for each increased level of availability, the required effort increases significantly. By effort, don’t think of technical solutions only. It also means organizational measures like having skilled personnel and proper procedures.

A fact is that only a small percentage of the causes for outage is technical, the majority of incidents is due to human error. And yes, that includes that bad driver which is programmed by humans. This is why changes in properly managed infrastructure should always go through test and acceptance procedures in environments representative or identical to the production environment. Unfortunately, this doesn’t always happen as not all IT departments have this luxury, mostly because of financial reasons.

Availability% Downtime / Year Downtime / Month
99.0% 3.65 days 7.3 hrs
99.9% 8.76 hrs 43.8 min
99.99% 52 min 4.3 min
99.999% 5.2 min 26 sec
99.9999% 31 sec 2.6 sec

Fault Tolerant
imageThe goal of a Fault Tolerant solution is to maximize the Mean Time Between Failure (MTBF). This is achieved by mirroring or replicating systems. These monolithic systems run software in parallel on identical hardware. This is called Lockstep (which, for your information, refers to synchronized marching).

Because Fault Tolerant systems run in parallel, the results of an operation can be compared. When the results don’t match, a fault occurs. Since the faulty system can’t be identified using 2 parallel systems, there’s also a variation to this architecture where one server functions as master and one as slave, the slave functioning as a hot-standby. To solve the ambiguity, you could use three systems where the majority of the systems determine the right output.

When faults are detected in a Fault Tolerant system, the failing component (or system) is disabled and the mirror takes over. This makes the experience transparent for the end-user. There is one caveat: since Fault Tolerant systems run software in parallel, software faults are also mirrored.

Examples of Fault Tolerant components are ECC RAM, multiple NICs in Fault Tolerant configuration, multipath network software, RAID 1+ disk systems or storage with replication technology. Examples of Fault Tolerant systems are HP NonStop (propriety), Stratus ftServer or Unisys ES7000. There are also software-based solutions like Marathon EverRun or VMWare’s FT offering.

High Availability
High Availability aims to maximize minimize the MTTR. This can be achieved by redundant or standby (cold, hot) systems or non-technical measures like on-site support contracts. Systems take over the functionality of the failing system after the failure occurred. Therefor, High Availability solutions aren’t always completely transparent for the user. The effects of a failing system and the consequences for end end-user depend on the software, e.g. a seamless reconnect or requirement to login again. Another point of attention is the potential loss of information caused by pending transactions being lost because of the failure. To make the experience more transparent for the user, application need to be resilient, e.g. detecting failure and retrying the transaction.

Examples of High Availability technologies are load balancing – software or hardware-based – and replication, where load balancing is used for static data and replication for dynamic data.

The Present
After a decade, technology has evolved but is still founded on old concepts. Network load balancing is still here and clustering (anyone remember Wolfpack?), although we moved from shared nothing to to replication technology, remains largely unchanged. This means either there hasn’t been much innovation or the technologies do a decent job; After all, it’s still a matter of demand and supply. Yes, we moved from certified configurations-only shared storage solutions to flexible Database Availability Groups (hey, this is still and Exchange blog), but most changes are in the added functionality category or to take away constraints, e.g. cluster modes (majority node set, etc.), multiple replicas and configurable replication.

Windows Server
Data Center Edition
x86 x64
2000 Max. 32 GB
32 CPUs
4 nodes
N/A
2003 SP2 Max. 128 GB
32 CPUs
8 nodes
Max. 512 GB
64 CPUs
8 nodes
2003 R2 SP2 Max. 64 GB
32 CPUs
8 nodes
Max. 2 TB
64 CPUs
8 nodes
2008 Max.  64 GB
32 CPUs
16 nodes
Max. 2 TB
64 CPUs
16 nodes
2008 R2 N/A Max. 2 TB
64 CPUs (256 logical)
16 nodes

What about Fault Tolerance and Windows’ Data Center Edition as the panacee for all your customers requiring “maximum uptime”? The issue with Fault Tolerant was that it came with a hefty price tag, especially in those days. Costs were an x-fold of the costs involved with High Availability solutions on decent (read: stable) hardware. So, for those extra 9’s you needed deep pockets. For example, around 2001 an Compaq ES7000 with Windows Server 2000 Data Center Edition, the joint-support queue (e.g. Microsoft and OEM) and services came with a $2m price tag for which you got the promise of 99,9% availability.

Compare that to buying a few Proliant’s with Windows Server 2000 Advanced Server, some Fault Tolerant components (FT NICs, RAID), off the shelf High Available technology and dedicated personnel (justifiable with that DCE price tag) for .. say, $250,000. With skilled personnel and operated in a controlled environment you could easily reach 99% availability. Is that price difference worth 3 days of downtime? Also, the simplicity to implement those technologies made High Availability in Windows accessible for the masses and now – certainly in the Exchange world – seldom see load balancing or forms of clustering not being utilized.

Note that in the past decade, I’ve never encountered Data Center for hosting Exchange. In fact, as of Exchange 2003, support for on Data Center was dropped. Nowadays, Data Center is regarded as an attractive option for large-scale virtualizations based on Hyper-V, not only because Data Center costs less than back then (about $3000 per CPU – hurray for multi core, but with a 2 CPU minimum) and runs certified on more hardware, but also because it comes with unlimited virtualization rights, meaning you may run Windows Server 2008 R2 (or previous version) Standard, Enterprise, and Datacenter in the virtual instances without the need to purchase additional licenses for those.

With all the large-scale virtualization and consolidation projects going on, virtualizing Exchange or other parts of your IT infrastructure, it’s good to know that there are other options when required by the business.