Exchange Server 2010 Architecture poster


Finally, the long awaited Exchange Server 2010 Architecture Poster is here!

This is similar to the Exchange 2007 Component Architecture poster and contains the architecture highlights and feature set of Microsoft Exchange Server 2010. This architecture poster is additional to the already published Microsoft Exchange Server 2010 Transport Server Role Architecture Diagrams which you could already get here.

You can download the Microsoft Exchange Server 2010 Architecture poster here.

A Decade in High Availability


A recent post from Elden Christensen, Sr. Program Manager Lead for Clustering & High Availability, reminded me of one of my former employers. When I joined that company back in 2000 for starting up a professional services based on Windows Server 2000 Data Center Edition, the company was already an established professional services provider in the business critical computing niche market, e.g. Tandem/Compaq/HP NonStop systems, mostly used in the financial markets, e.g. banks or stock exchanges. The Windows platform was regarded as inferior at that time by the NonStop folks and they had good arguments back then.

Remember, those were also the early days where no one was surprised to see an occasional blue screen (people were also using Windows 9x) and what we now know as virtualization was already happening on mainframes in the form of partitioning. At that time, Microsoft with their Windows Server platform had ambitions to enter the data center environment, where the NonStop platform was an established platform for ages and professionals had developed best practices for those environments.

Another part of the discussion was the Fault-Tolerance  versus High Availability topic, where NonStop was already an established Fault Tolerant solution for business critical environments, Windows still had only ambitions to move towards that market with the Data Center product. A logical move, looking at the status of (web) applications, SQL and last but not least, Exchange and where it was going and what customers expected of those products regarding availability and reliability. To repeat an infamous quote of a NonStop colleague back then, “E-mail is not business critical”. But that was almost 10 years ago, things have changed .. or haven’t they?

Single Point of Failure
First I’ll start by introducing the availability concept, which revolves around eliminating the single point of failure. This is an element in the whole system of hardware, software and organization that can cause downtime for a system, i.e. disruption of services. After identifying a single point of failure, we want to eliminate it to prevent downtime which is, after all, the ultimate goal for a business critical system. We can approach this task using two different strategies, Fault Tolerant (FT) or High Availability (HA). The task of identifying and eliminating single points of failure is an ongoing process, as most IT environments are subject to change over time.

Availability
To understand the Fault Tolerant and High Availability strategies we need to define the term “Availability”. In the dictionary, availability is defined as the quality or state of being available or an available person or thing, where in both cases available means present or ready for immediate use. The availability is mostly expressed as a percentage, for example when used in a service level agreement, but what does that percentage mean? To explain this take a look at the following diagram:

Lifecycle

I assume this lifecycle speaks for itself. Using this diagram, the availability is calculated as follows: MTBF / (MTBF + MTTR). The related expected downtime is calculated as ( 1 – Availability% ) * 1 year. Note that the time between failure and recovery isn’t used in the calculation.

I’ll use a simple example, a 500 GB Seagate Barracuda 7200.12 (ST3500412AS) with a MTBF specification of 750,000 hours. You have a 24 hours replacement contract and need about 4 hours to restore the backup. The availability would then be 750,000 / ( 750,000 + 28 ) = 0.9999626680% resulting in a yearly downtime of ( 1 – 0.9999626680) * ( 365 days  * 24 hours * 60 minutes ) = 19,6 min.

Of course with hardware these numbers are theoretical and to some extent a marketing thing; how else can Seagate specify an MTBF of 750,000 hours ( 85 years ). I tend to look at it as an indication of the reliability you can expect. For example, compare the MTBF of 7200.12 drive with an enterprise class drive, Seagate’s ES product line. The ST3500320NS has an MTBF of 1,200,000 hours.

That’s the reason you should use enterprise class drives in your storage solution instead of desktop drives, which aren’t supposed to run in 24×7 environments. To add to that, the MTBF decreases when used in series (RAID 0 = 1 / (1/MTBF1 + .. 1/MTBFn)) or increases when used in parallel (RAID 1 = MTBF * ( 1 + 1/2 + 1/n) ) configurations. When trying to do calculations for the whole supply chain, with all the elements and their individual specifications and support contracts, this can get very complex.

The 9’s
imageWhen talking about availability this is often shown using a series of 9’s, e.g. 99.9%. The more 9’s it has, the better (less downtime). Note that for each increased level of availability, the required effort increases significantly. By effort, don’t think of technical solutions only. It also means organizational measures like having skilled personnel and proper procedures.

A fact is that only a small percentage of the causes for outage is technical, the majority of incidents is due to human error. And yes, that includes that bad driver which is programmed by humans. This is why changes in properly managed infrastructure should always go through test and acceptance procedures in environments representative or identical to the production environment. Unfortunately, this doesn’t always happen as not all IT departments have this luxury, mostly because of financial reasons.

Availability% Downtime / Year Downtime / Month
99.0% 3.65 days 7.3 hrs
99.9% 8.76 hrs 43.8 min
99.99% 52 min 4.3 min
99.999% 5.2 min 26 sec
99.9999% 31 sec 2.6 sec

Fault Tolerant
imageThe goal of a Fault Tolerant solution is to maximize the Mean Time Between Failure (MTBF). This is achieved by mirroring or replicating systems. These monolithic systems run software in parallel on identical hardware. This is called Lockstep (which, for your information, refers to synchronized marching).

Because Fault Tolerant systems run in parallel, the results of an operation can be compared. When the results don’t match, a fault occurs. Since the faulty system can’t be identified using 2 parallel systems, there’s also a variation to this architecture where one server functions as master and one as slave, the slave functioning as a hot-standby. To solve the ambiguity, you could use three systems where the majority of the systems determine the right output.

When faults are detected in a Fault Tolerant system, the failing component (or system) is disabled and the mirror takes over. This makes the experience transparent for the end-user. There is one caveat: since Fault Tolerant systems run software in parallel, software faults are also mirrored.

Examples of Fault Tolerant components are ECC RAM, multiple NICs in Fault Tolerant configuration, multipath network software, RAID 1+ disk systems or storage with replication technology. Examples of Fault Tolerant systems are HP NonStop (propriety), Stratus ftServer or Unisys ES7000. There are also software-based solutions like Marathon EverRun or VMWare’s FT offering.

High Availability
High Availability aims to maximize minimize the MTTR. This can be achieved by redundant or standby (cold, hot) systems or non-technical measures like on-site support contracts. Systems take over the functionality of the failing system after the failure occurred. Therefor, High Availability solutions aren’t always completely transparent for the user. The effects of a failing system and the consequences for end end-user depend on the software, e.g. a seamless reconnect or requirement to login again. Another point of attention is the potential loss of information caused by pending transactions being lost because of the failure. To make the experience more transparent for the user, application need to be resilient, e.g. detecting failure and retrying the transaction.

Examples of High Availability technologies are load balancing – software or hardware-based – and replication, where load balancing is used for static data and replication for dynamic data.

The Present
After a decade, technology has evolved but is still founded on old concepts. Network load balancing is still here and clustering (anyone remember Wolfpack?), although we moved from shared nothing to to replication technology, remains largely unchanged. This means either there hasn’t been much innovation or the technologies do a decent job; After all, it’s still a matter of demand and supply. Yes, we moved from certified configurations-only shared storage solutions to flexible Database Availability Groups (hey, this is still and Exchange blog), but most changes are in the added functionality category or to take away constraints, e.g. cluster modes (majority node set, etc.), multiple replicas and configurable replication.

Windows Server
Data Center Edition
x86 x64
2000 Max. 32 GB
32 CPUs
4 nodes
N/A
2003 SP2 Max. 128 GB
32 CPUs
8 nodes
Max. 512 GB
64 CPUs
8 nodes
2003 R2 SP2 Max. 64 GB
32 CPUs
8 nodes
Max. 2 TB
64 CPUs
8 nodes
2008 Max.  64 GB
32 CPUs
16 nodes
Max. 2 TB
64 CPUs
16 nodes
2008 R2 N/A Max. 2 TB
64 CPUs (256 logical)
16 nodes

What about Fault Tolerance and Windows’ Data Center Edition as the panacee for all your customers requiring “maximum uptime”? The issue with Fault Tolerant was that it came with a hefty price tag, especially in those days. Costs were an x-fold of the costs involved with High Availability solutions on decent (read: stable) hardware. So, for those extra 9’s you needed deep pockets. For example, around 2001 an Compaq ES7000 with Windows Server 2000 Data Center Edition, the joint-support queue (e.g. Microsoft and OEM) and services came with a $2m price tag for which you got the promise of 99,9% availability.

Compare that to buying a few Proliant’s with Windows Server 2000 Advanced Server, some Fault Tolerant components (FT NICs, RAID), off the shelf High Available technology and dedicated personnel (justifiable with that DCE price tag) for .. say, $250,000. With skilled personnel and operated in a controlled environment you could easily reach 99% availability. Is that price difference worth 3 days of downtime? Also, the simplicity to implement those technologies made High Availability in Windows accessible for the masses and now – certainly in the Exchange world – seldom see load balancing or forms of clustering not being utilized.

Note that in the past decade, I’ve never encountered Data Center for hosting Exchange. In fact, as of Exchange 2003, support for on Data Center was dropped. Nowadays, Data Center is regarded as an attractive option for large-scale virtualizations based on Hyper-V, not only because Data Center costs less than back then (about $3000 per CPU – hurray for multi core, but with a 2 CPU minimum) and runs certified on more hardware, but also because it comes with unlimited virtualization rights, meaning you may run Windows Server 2008 R2 (or previous version) Standard, Enterprise, and Datacenter in the virtual instances without the need to purchase additional licenses for those.

With all the large-scale virtualization and consolidation projects going on, virtualizing Exchange or other parts of your IT infrastructure, it’s good to know that there are other options when required by the business.

Exchange 2007 SP3 prevents Exchange 2010 RTM prep


Exchange fellow Johan Veldhuis blogged about something interesting (or rather, something silly) which you should know about when planning to deploy Exchange 2010 RTM in an Exchange 2007 environment.

Apparently, the Service Pack 3 for Exchange 2007 raises the version number of the schema above the version number set by Exchange 2010. This will result in the following error message when trying to upgrade the schema:

Setup encountered a problem while validating the state of Active Directory. The Active Directory schema version (14625) is higher than the Setup’s version (14622). Therefor, PrepareSchema can’t be executed.

The result: You can’t perform the schema upgrade for Exchange 2010 RTM in an environment where Exchange 2007 SP3 is already applied.

Now this information is not only of interest to current Exchange 2007 users, but also to clients wanting to migrate from Notes to Exchange 2010 for instance. They might want to make use of Exchange 2007 for running the Transporter Suite to connect Notes to Exchange. Implementing Exchange 2010 RTM first is also not an option, because that would prevent the installation of Exchange 2007.

Given this information you could assume the following order of installation would prevent this issue:

  1. Prepare for Exchange 2007 SP2 sets rangeUpper 14622, forest version 11222 and domain version 11221;
  2. Prepare for Exchange 2010 RTM sets rangeUpper 14622, forest version 11222 and domain version 11221;
  3. Prepare for Exchange 2007 SP3 sets rangeUpper 14625, forest version 12640 and domain version 12639?

But alas, when trying the perform the PrepareSchema of Exchange 2007 SP3 against an Exchange 2010 RTM prepared organization we are presented with the following message:

The exchange organization does not support this version of exchange server

When I retried the same thing but with installing an Exchange 2007 SP2 server , the operation seems to work, i.e.

  1. Prepare for Exchange 2007 SP2 & Install Exchange 2007 SP2 server;
  2. Prepare for Exchange 2010 RTM & Install Exchange 2010 RTM Server;
  3. Prepare for Exchange 2007 SP3.

I also did an Active Directory compare using ADexplorer comparing the PostEx2007SP2-PostEx2007SP3 situation against the PostEx2007SP2-PostEx2010RTM and it showed nothing of interest really, besides the usual ChangedOn, CreatedOn and GUID differences and some changes which had to do with the order of installation.

According to information in the Exchange Server Active Directory Schema Changes Reference dated June 2010, “Exchange 2010 makes the same changes to the Active Directory schema as Exchange 2007 SP2”. But Exchange 2007 SP3 makes some additional changes to the schema, which can be checked here. Looking at the explanation contained in that article, these changes may be required for environments attaching disclaimers to voice mail or fax messages. Of course, those changes won’t be there when you prepared the schema using Exchange 2010 RTM, so those environments using that functionality might expect issues in that area.

To wrap things up, unless it will become officially supported to run Exchange 2010 RTM in (directly) Exchange 2007 SP3 prepared environments, you have to be very careful with planning the order of installation for greenfield scenarios or scenarios where Exchange 2007 SP2 is in-place. I assume Exchange 2010 SP1 will solve the problem as the Exchange 2010 SP1 beta updates the schema to version 14718 (could change with release of SP1).

Exchange 2010 Hub Transport Diagrams


Seem to have forgot to blog these and considering many people are searching for them, here is a link to the Exchange 2010 architecture diagrams:

Unfortunately, the Exchange 2007 Component Architecture and Edge Transport diagrams haven’t been updated for Exchange 2010. For archival purposes, here are the links to the Exchange 2007 diagrams:

Exchange 2010 Deployment Assistant updated


Today Microsoft updated their Exchange 2010 Deployment Assistant.It got updated after the initial release in mid-November where you could only select upgrading from Exchange 2003. Now all scenarios work, so you can select if you’re upgrading from Exchange 2003, Exchange 2007, Exchange 2003/2007 or performing a greenfield Exchange 2010 installation.

After selecting the scenario you will be asked a few questions, such as “Are you running a disjounted namespace?” or “Are you planning to to use public folders in Exchange 2010?” (nooooo). When finished you’ll be presented a checklist.

While the tool is available online-only, you can download the checklist for offline use by clicking Download Checklist (top right). This will download the checklist in PDF format. It’s a good reference for planning and can be a useful tool to keep track of where you are in the process.