UPDATE 13.8.2018: I published a a new blog post about switching from Zen Load Balancer to HAProxy. The time has come to retire Zen for us…
We’ve been using the Zen Load Balancer Community Edition in production for almost a year now and it has been working great. I previously wrote a blog post about installing and configuring Zen, and now it was time to look at the HA aspect of the back-end servers defined in various Zen farms. Zen itself is quite easy to set up in HA-mode. You just configure two separate Zen servers in HA-mode according to Zen’s own documentation. Well, this is very nice and all, and it’s also working as it should. The thing that confused me the most however (until now), is the HA aspect of the back-ends. I somehow thought that If you specify two back-ends in Zen and one of them fail, Zen automatically uses the backend which is working and marked as green (status dot). Well, this isn’t the case. I don’t know if I should blame myself or the poor documentation – or both. Anyways, an example is probably better. Here’s an example of L4xNAT-farms for Exchange (with two back-ends):
I guess it’s quite self-explanatory; we’re Load Balancing the “normal” port 443 + imap and smtp. (All the smtp-ports aren’t open to the Internet though, just against our 3rd party smtp server). The http-farm is used for http to https redirection for OWA.
Furthermore, expanding the Exchange-OWAandAutodiscover-farm:
and the monitoring part of the same farm:
This clearly shows that the “Load Balancing-part” of Zen is working – the load is evenly distributed. You can also see that the status is green on both back-ends. Fine. Now one would THINK that the status turns RED if a back-end is down and that all traffic would flow through the other server if this happens. Nope. Not happening. I was living in this illusion though 😦 As I said before, this is probably a combination of my own lack of knowledge and poor documentation. Also, afaik there are no clear “rules” for the farm type you should use when building farms. Zen itself (documentation) seem to like l4xnat for almost “everything”. However, if you’re using HTTP-farms, you get HA on the back-ends out-of-the box. (You can specify back-end response timeouts and checks for resurrected back-ends for example). Then again, you’ll also have to use SSL-offloading with the http-farm which is a whole different chapter/challenge when used with Exchange. If you’re using l4xnat you will NOT have HA enabled on the back-ends out-of-the-box and you’ll have to use FarmGuardian instead. Yet another not-so-well-documented feature of Zen.
FarmGuardian “documentation” is available at https://www.zenloadbalancer.com/farmguardian-quick-start/. Have a look for yourself and tell me if it’s obvious how to use FarmGuardian after reading.
Luckily I found a few hits on Google (not that many) that were trying to achieve something similar:
These gave me some ideas. Well, I’ll spare you the pain of googling and instead I’ll present our (working) solution:
First off, you’ll NEED a working script or command for the check-part. Our solution is actually a script that checks that every virtual directory is up and running on each exchange back-end. If NOT, the “broken” back-end will be put in down-mode and all traffic will instead flow through the other (working) one. I chose 60 sec for the check time, as Outlook times out after one minute by default (if a connection to the exchange server can’t be established). Here’s the script, which is based on a script found at https://gist.github.com/phunehehe/5564090:
Big thanks to the original script writer and to my workmate which helped me modify the script. Sorry, only available in “screenshot form”.
You can manually test the script by running ./check_multi_utl.sh “yourexchangeserverIP” from a Zen terminal:
The (default) scripts in Zen are located in /usr/local/zenloadbalancer/app/libexec btw. This is a good place to stash your own scripts also.
You can find the logs in /usr/local/zenloadbalancer/logs. Here’s a screenshot from our log (with everything working):
And lastly I’ll present a couple of screenshots illustrating how it looks when something is NOT OK:
(These screenshots are from my own virtual test environment, I don’t like taking down production servers just for fun 🙂 )
FarmGuardian will react and present a red status-symbol. In this test, I took down the owa virtual directory on ex2. When the problem is fixed, status will return to normal (green dot).
and in the log:
The log will tell you that the host is down.
Oh, as a bonus for those of you wondering how to do a http to https redirect in Zen:
Create new HTTP-farm and leave everything as default. Add a new service (name it whatever you want) and then just add the rules for redirection. Yes, it’s actually this simple. At least after you find the documentation 🙂
And there you have it. Both the Zen servers AND the back-ends working in HA-mode. Yay 🙂