Load Balancing Exchange 2013 (CAS) with clustered (Zen) Load Balancers

I decided to skip my post about Exchange Database Availability Groups (DAGs), as all the information needed was already very well documented. All I did was following the excellent guide from exchangeserverpro.com, Installing an Exchange Server 2013 Database Availability Group. I got my DAG up and running in no time. We already have a working DAG environment up and running as well.

The part which needed some extra attention was High Availability/Load Balancing, mainly load balancing the CAS. (DAGs are sort of high availability/failover tolerant by design, but if the CAS go down they’re rather useless).

UPDATE 22.2.2017: I published a new blog post about Using FarmGuardian to enable HA on Back-ends in Zen Load Balancer. Use this as a compliment to this guide.

I’ll start off by making illustrations of the current and soon-to-be situations.

Current situation

 

exchange_current_setup

Fig 1. Current setup

  • 1 Exchange server used as CAS proxy/redirect.
    • No user mailboxes
    • Not actually needed, we could also use dns round robin. The server was meant to replace the Exchange 2010 server at first… then change of plans, not going into any details here.
    • Will be taken out of production and replaced by Zen Load Balancing cluster
    • Single point of failure
  • 1 server running Exchange 2010. Existing users will be migrated from this server to the (two) Exchange 2013 servers real soon. (When certificate issues are fixed).
    • When this is done, the server will be taken out of production.
  • 2 Exchange 2013 servers, running in two different physical locations (even though in same domain).
    • DAG is used between the two servers
    • All users will be moved to these two servers
  • Single namespace

 

Soon-to-be situation

Goals:

  • No (CAS) single point of failure
  • Only Exchange 2013 servers in the environment

 

exchange_soon_to_be_setup

Fig 2. Soon-to-be setup

  • 2 Clustered Zen Load Balancers (in different physical locations)
  • 2 Exchange 2013 servers (the existing ones from Fig 1)
  • Single namespace (exzen)

This whole setup might seem a bit small, but at the moment it’s sufficient. We’ll expand when the need is there. Only calendars and contact information is currently stored on the exchange servers, email is handled by our 3rd party imap server.

NOTE! If you are using VMware in your environment, be sure to check this information:

https://keepingitclassless.net/2013/04/virtual-routing-part-2-fhrp-issues-in-vmware-vsphere/

It will save you time and nerves when trying to figure out why replication isn’t working.

 

Installing Zen Load Balancing Cluster

Introduction

I was put to the task of finding a good load balancing solution for Exchange. In its most simple configuration you could just use dns round robin (http://exchangeserverpro.com/exchange-2013-client-access-server-high-availability/), but I wasn’t too convinced about this idea. Round robin seemed more like the poor mans load balancer. That said, it will work. I even gave it a go in my test environment. The clients got a little bit (too) confused though. Not good. I decided to move along to a “real” load balancer.

Luckily a Layer 4 solution will work fine with Exchange 2013, so there’s no need for a more complex Layer 7 solution. Keeping it simple is the key. Some facts:

“In Exchange Server 2010 it’s often said that the main skill administrators needed to learn when was how to deploy and manage load balancing. The concept of the RPC Client Access Array, the method used to distribute MAPI traffic between Client Access Servers was a common area of pain. Modern advances in Layer 7 load balancing also allowed for SSL offload, service level monitoring and load balancing and intelligent affinity using cookies to mitigate against some of Exchange 2010’s shortcomings.”

Improvements to Load Balancing

What we’re getting at is the two key improvements in Exchange 2013 that make load balancing suddenly quite simple. HTTPS-only access from clients means that we’ve only got one protocol to consider, and HTTP is a great choice because it’s failure states are well known, and clients typically respond in a uniform way.”

Source: http://www.msexchange.org/articles-tutorials/exchange-server-2013/high-availability-recovery/introducing-load-balancing-exchange-server-2013-part1.html

That said, there are many load balancing solutions available out there. Windows Network Load Balancing (WNLB) is one example that comes to mind, but it has limitations:

“WNLB can’t be used on Exchange servers where mailbox DAGs are also being used because WNLB is incompatible with Windows failover clustering. If you’re using an Exchange 2013 DAG and you want to use WNLB, you need to have the Client Access server role and the Mailbox server role running on separate servers. “

Source: https://technet.microsoft.com/en-us/library/jj898588%28v=exchg.150%29.aspx

That’s no good. After some investigation I found a webpage that compares open-source load balancers, http://wso2.com/library/articles/2014/03/wso2-elb-vs-other-open-source-load-balancers/. This was promising; Linux-based load balancers that not only use little resources in our data center – they’re free as well. Brilliant.

I started out with testing HAProxy. I had heard some good things about it. Well… short story: It turned out to be a bit of pain to configure. I didn’t want to waste all of my energy on configuring.

Next candidate: Zen Load Balancer. Oh, what a difference. Very easy to install, very easy to configure. A nice web interface from which you can configure everything needed. There are many commercial alternatives available as well, but Zen seemed to come very close to these (on the open-source market).

 

Preparations

Be sure to use a single namespace in Exchange 2013. This is actually best practise even without a load balancer, and it makes your life easier. If you have no idea what I’m talking about please have a look at the following links for example:

http://3techies.com/?p=194
http://www.msexchangegeek.com/inside-the-exchange-2013-single-namespace-part-1/
http://blogs.technet.com/b/exchange/archive/2014/02/28/namespace-planning-in-exchange-2013.aspx
http://blog.netwrix.com/2014/03/21/configuring-exchange-2013-for-site-resilience-2/

While you’re at it, have a look at managing certificates at the same time: http://www.msexchange.org/articles-tutorials/exchange-server-2013/management-administration/managing-certificates-exchange-server-2013-part1.html. We’re using a SAN certificate for autodiscover.ourdomain.com and exzen.ourdomain.com.

DNS: Instead of having clients pointing to the CAS, point your clients to the Load Balancer. (You then configure the load balancer with the real IP’s of the back-end exchange servers). More info in the next chapter.

 

Installation

 

Pictures say more than words, so here you go:

zen_global_view

Fig 3. Zen Load Balancer Global View

 

zen_farms

Fig 4. Zen Farms

 

zen_backend_status

Fig 5. Zen backend status

 

zen_interfaces

Fig 6. Zen Interfaces. Both Zen servers have dual NIC, one dedicated to the cluster service (eth1).

Zen 1:

  • Physical IP (eth0): 10.0.0.60
  • Virtual IP (virtual network interface): 10.0.0.61
  • Physical IP (eth1): 10.0.0.80
  • Virtual IP (used for cluster service): 10.0.0.85

Zen 2:

  • Physical IP (eth0): 10.0.0.70
  • Virtual IP (virtual network interface): 10.0.0.61
  • Physical IP (eth1): 10.0.0.90
  • Virtual IP (used for cluster service): 10.0.0.85

 

zen_rsa_between_hosts

Fig 7. RSA communication between cluster hosts

 

zen_cluster1

Fig 8. Zen Cluster configuration – success (with failover).

 

zen_master_node

Fig 9. Cluster status – master

 

zen_backup_node

Fig 10. Cluster status – backup

 

DNS settings:

exchange_zen_dns

Fig 11. DNS settings

 

Outlook connection status:

outlook_connection_status

Fig 12. Outlook connection status. Connected to exzen, (which is pointed to Zen Load Balancer in DNS).

 

Testing failover

Failover is luckily easy to test in a virtual environment as you can just suspend a virtual machine 🙂 I did a test run (suspended zen1) while constantly pinging the virtual IP and having an eye on the Outlook connection status window. Here’s the result:

http://youtu.be/DLCMVVw2tN4

  • At 00:11 the ping reply stops for a brief moment (when I hit suspend on zen1)
  • At 00:15 the client is noticing that the proxy server is down/suspended (ping request times out)
  • At 00:26 ping reply to the proxy server is active again (automatic failover to zen2/”backup”)
  • At 00:40 I resume/re-activate zen1 again
  • At 00:42 Outlook notices this (established/connecting changes state in Outlook Connection Status)
  • At 00:45 the ping reply stops for a brief moment (when zen1 is resuming/getting back online after the “fail”)
  • At 00:46 ping reply times out (zen1 is configuring itself to become the “master” again)
  • At 00:55 all connectivity from Outlook to the proxy is temporarily lost
  • At 00:58 the proxy server answers to ping requests again
  • At 01:05 the connection to the proxy server is active/established in Outlook
  • At 01:06 –> everything is back to normal.

To sum it up: Outlook is only offline for 10 seconds (00:55 – 01:05), and it newer even complains about being offline (no yellow exclamation mark). Not bad. 10 seconds of downtime is something we definitely can live with.

 

Final words

As you can see, everything is working nicely in my test environment. Before going production there are a couple of things that should be considered:

  • Certificate for the Zen Load Balancer Web Interface. Is it needed or blocked from the outside world?
    • other security considerations on the load balancer(s) such as open ports etc. Networking team will decide this.
  • After the cluster is up and running in production, test it only on a couple of clients at first.
  • Probably lots more I can’t think of right now…
Advertisements