Exchange 2013/2016: Switching from Zen Load Balancer to HAProxy

I’m still a huge fan of Zen (Zevenet) Load Balancer, and it’s been serving our Exchange servers for a couple of years. HOWEVER, Zen’s features are a bit limited and not enough for us anymore. It does a very good job with basic load balancing, but unfortunately it lacks in the logging department. This was a deal breaker for us, as we weren’t able to get the client IP’s logged. If you’re using any load balancer in Source NAT (SNAT)/reverse proxy configuration (most certainly you are), this is the default behavior. You’ll then end up with the load balancer IP instead of the client IP in the (Exchange IIS) logs. Also, with Zen LB, you’ll get NO client logs at all on the load balancer itself. This is not ideal. Furthermore, when I was testing http mode instead of L4xNAT in Zen, I couldn’t get it working as intended (including x-forward-for). I configured it with no persistency/affinity, but Outlook wouldn’t simply start. I scratched my head with this for far too long, and I had to give up. This seems like a bug in Zen if you ask me. (Exchange 2013 is btw designed to work without session affinity/persistence, see https://blogs.technet.microsoft.com/exchange/2014/03/05/load-balancing-in-exchange-2013/).

This left us no other choice than to look for alternative load balancers. We finally settled for HAProxy, as the price tag was just right (free). That said, I ALMOST had to throw in the towel with HAProxy also, as the Mac Outlook clients wouldn’t work no matter what. Luckily I got it sorted out in the end though, more about that later on.

The SNAT/client IP dilemma is a big deal, but the whole architecture/algorithm/method behind it isn’t really the point of this blog post. None the less, you should read at least some homework. Here are a couple of nice articles describing the dilemma and also different load balancing methods:

HAProxy: Microsoft Exchange 2013 architectures (check the limitations in reverse-proxy mode/source NAT)
Logging Client IP Address in IIS When Using Load Balancing with Source NAT
What are the best load balancing methods and algorithms?

Now that you know a little bit more about the LB methods, it’s time to think about how to configure HAProxy itself. HAProxy with Exchange 2013 (2016), that is.

First off, you’ll need L7 load balancing (http), so you’ll get the possibility to insert the x-forward-for header. This is the trick for getting the client IP in the logs (See https://www.loadbalancer.org/blog/iis-and-x-forwarded-for-header/ for information how to configure x-forward-for on the Exchange servers). I found a very nice blog post describing how to install and configure HAProxy for Exchange, “Highly Available L7 Load Balancing for Exchange 2013 with HAProxy”. I have to say it’s TOTALLY AWESOME. I couldn’t have done the configuration without help from this blog post. No other article I found on the Internet had such deep detail and demonstration regarding L7 load balancing for Exchange. That said, I added some custom stuff to the configuration file, and I also configured/installed HAProxy + keepalived a bit different (easier) than in the above guide. Well, enough talking. Here are my steps, based on Zoltan’s blog post:

  • I didn’t install a PKI infrastructure. Most of you probably have a working PKI infrastructure in production already.
    • I use public certificates on Exchange so I have no need for an internal PKI (in this specific case).
  • I didn’t install any Exchange servers either (obviously).
  • I installed CentOS 7 (two servers) in cooperation with our Linux team. This way I got the optimal installation (no extra crap, and centralized management with Puppet).
  • I started following the guide from part 5, http://ezoltan.blogspot.com.au/2014/10/highly-available-l7-load-balancing-for.html
    • Please read the chapter Brief HAProxy Certificate Primer to get an idea of what you’re trying to accomplish!
    • Take note that we’re not doing SSL offloading, we’re in fact doing SSL bridging

 

Certificate preparation

  • I’m using a public certificate, so there’s no need to update the root and intermediate cert store stuff (yet). (In fact the intermediate certificate needed some configuration, more about that later on).
  • I started following the guide more to the letter from the chapter Upload the Exchange Certificate and Private Key onwards
  • I extracted the private key from the pfx file and removed the password protection from the private key. When asked, I entered the password gotten from the public certificate provider. I then entered a new PEM pass phrase (just pick a new one):

           haproxy_extract_private_key

  • Now it’s time to remove the password protection from the private key. You’ll be prompted for the PEM pass phrase entered in the above step:

           haproxy_remove_pw_from_private_key

  • We’re now ready for the final stage – extracting the certificate from the pfx file:

           haproxy_extract_certificate_from_pfx_file

  • We now have all files needed, and we just need to combine the certificate and the private key files, as HAProxy doesn’t allow use of separate files.
    • cat exchange_certificate.pem exchange_private_key_nopassword.pem > exchange_certificate_and_key_nopassword.pem
  • Moved the file to its final destination:
    • mv exchange_certificate_and_key_nopassword.pem /etc/ssl/certs/
    • I’ll now copy/paste Zoltan; “Well, let’s give ourselves a pat on the shoulder, we deserve it. We are through the most difficult part, at least in my opinion, of this lab. Well done!” 🙂 I have to agree with the guy here…

 

HAProxy installation

I did NOT compile HAProxy from scratch, instead I just yum installed the HAProxy package. Much more straight forward. Here are my steps:

  • yum install haproxy
  • I used a “dummy” haproxy.conf –file (for Exchange). It was modified down the road, see the working example from the config file chapter later on.
  • Apart from the info in the blog, you should also edit another line in rsyslog.conf. Add local0.none to the following line:
    • *.info;mail.none;authpriv.none;cron.none,local0.none          /var/log/messages
  • There’s no need to create a log rotate-script when yum installing HAProxy, it gets created automatically!
    • I edited the log rotate-script and changed the rotate parameter to “30”. One month is a suitable time for us to keep the logs.
  • Same thing goes for HAProxy automatic startup at boot. There’s no need for complicated scripts, just execute the following command instead:
    • systemctl enable haproxy:

              haproxy_autostart_centos

  • I then configured the firewall.
  • You should now do some testing. I had previously done some tests in my virtual environment, so I had no need for any tests right now.
  • Thus far, everything is working:

          haproxy_service_status_ok

  • Happy days!

 

HAProxy HA

Now that we have one node working (master), it’s time to think about High Availability. I continued following the excellent guide at http://ezoltan.blogspot.com.au/2014/10/highly-available-l7-load-balancing-for_48.html. Yet again had some changes in my environment. Comments below:

  • I did not compile keepalived from scratch, instead I yum installed it.
  • Added net.ipv4.ip_nonlocal_bind=1 to /etc/sysctl.conf
  • No need to create keepalived.conf, it’s pre-created when yum installed.
  • Edited keepalived.conf to match our environment (copy pasted the blog config and edited email / interface / IP parameters)
  • Edited the hosts file following the blog information
  • There’s yet again no need for a startup script when you have yum installed. Keepalived will autostart at every boot when you execute the following:
    • systemctl enable keepalived
  • I made firewall changes based on the blog.
  • Did all the tests, everything worked like a charm 🙂
  • Made the same changes to the other server/node (backup)
    • priority was set to a lower value than on master node
    • other small changes which are written in Zoltan’s blog
  • Did keepalived-testing following the blog. Everything was fine! 🙂

 

Testing

This has to be my favorite chapter from Zoltan’s blog. I have to say he does a VERY good job explaining all the testing and troubleshooting. READ IT CAREFULLY! I myself also noticed “weird” clients connecting to the “unspecified protocol” back-end “be_ex2013”. If all ACL’s match, there should be no traffic passing through this back-end. I found some autodiscover and ews urls that had mixed upper/lower case letters however. This was easily fixed by adding a few more ACL’s. I’ll paste the config (including the ACL’s) in the next chapter so you’ll get a better understanding of the whole thing.

 

Config file, haproxy.conf

Well, this is the most important part of the whole setup, no doubt. Also the most time consuming. Here’s the config, I’ll explain it in more detail after the paste:

#---------------------------------------------------------------------
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------

global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #

    log         127.0.0.1 local2 info
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/run/haproxy.stat     


#--------------------------
# SSL tuning / hardening
#--------------------------
    ssl-default-bind-options no-sslv3
    ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
    ssl-default-server-options no-sslv3
    ssl-default-server-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
    tune.ssl.default-dh-param 2048
   
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------

# Regarding timeout client and timeout server:
# https://discourse.haproxy.org/t/high-number-of-connection-resets-during-transfers-exchange-2013/1158/4

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option                  forwardfor       except 127.0.0.0/8
    option                  redispatch
#   option                  contstats
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          15m # this value should be rather high with Exchange
    timeout server          15m # this value should be rather high with Exchange
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 100000


#-------------------------------------------------------
# Stats section
#-------------------------------------------------------

listen stats x.x.x.x:444 # VIP
        stats enable
        stats refresh 300s
        stats show-node
        stats auth admin:xxxxxx
        stats hide-version
        stats uri  /stats


#---------------------------------------------------------------------
# Main front-end which proxies to the back-ends
#---------------------------------------------------------------------

frontend fe_ex2013
# http-response set-header Strict-Transport-Security max-age=31536000;\ includeSubdomains;\ preload
  http-response set-header X-Frame-Options SAMEORIGIN
  http-response set-header X-Content-Type-Options nosniff
  mode http
  bind *:80
  bind *:443 ssl crt /etc/ssl/certs/exchange_certificate_and_key_nopassword.pem
  redirect scheme https code 301 if !{ ssl_fc }   # redirect 80 -> 443 (for owa)
  acl autodiscover url_beg /Autodiscover
  acl autodiscover url_beg /autodiscover
  acl mapi url_beg /mapi
  acl rpc url_beg /rpc
  acl owa url_beg /owa
  acl owa url_beg /OWA
  acl eas url_beg /Microsoft-Server-ActiveSync
  acl ecp url_beg /ecp
  acl ews url_beg /EWS
  acl ews url_beg /ews
  acl oab url_beg /OAB
  use_backend be_ex2013_autodiscover if autodiscover
  use_backend be_ex2013_mapi if mapi
  use_backend be_ex2013_rpc if rpc
  use_backend be_ex2013_owa if owa
  use_backend be_ex2013_eas if eas
  use_backend be_ex2013_ecp if ecp
  use_backend be_ex2013_ews if ews
  use_backend be_ex2013_oab if oab
  default_backend be_ex2013

 

#------------------------------
# Back-end section
#------------------------------

backend be_ex2013_autodiscover
  mode http
  balance roundrobin
  option httpchk GET /autodiscover/healthcheck.htm
  option log-health-checks
  http-check expect status 200
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_mapi
  mode http
  balance roundrobin
  option httpchk GET /mapi/healthcheck.htm
  option log-health-checks
  http-check expect status 200
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_rpc
  mode http
  balance roundrobin
  option httpchk GET /rpc/healthcheck.htm
  option log-health-checks
  http-check expect status 200
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_owa
  mode http
  balance roundrobin
  option httpchk GET /owa/healthcheck.htm
  option log-health-checks
  http-check expect status 200
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_eas
  mode http
  balance roundrobin
  option httpchk GET /microsoft-server-activesync/healthcheck.htm
  option log-health-checks
  http-check expect status 200
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_ecp
  mode http
  balance roundrobin
  option httpchk GET /ecp/healthcheck.htm
  option log-health-checks
  http-check expect status 200
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_ews
  mode http
  balance roundrobin
  option httpchk GET /ews/healthcheck.htm
  option log-health-checks
  http-check expect status 200
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_oab
  mode http
  balance roundrobin
  option httpchk GET /oab/healthcheck.htm
  option log-health-checks
  http-check expect status 200
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013
  mode http
  balance roundrobin
  server exchange1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
  server exchange2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt

 

#######################################
# End of Exchange's own protocols,
# STMP and IMAP next
########################################


frontend fe_exchange_smtp
    mode tcp
    option tcplog
    bind x.x.x.x:25 name smtp # VIP
    default_backend be_exchange_smtp
 
backend be_exchange_smtp
    mode tcp
    option tcplog
    balance roundrobin
    option log-health-checks
    server exchange1 1.1.1.1:25 weight 10 check
    server exchange2 2.2.2.2:25 weight 20 check

#only port 25 needed in our case. The port is open (only) against our Postfix server, which handles the outgoing mail traffic (MTA). In other words, we're using an external send connector in Exchange.

 
frontend fe_exchange_imaps
    mode tcp
    option tcplog
#   bind x.x.x.x:143 name imap  # NO unencrypted imap traffic allowed...
    bind x.x.x.x:993 name imaps # ssl crt /etc/ssl/certs/exchange_certificate_and_key_nopassword.pem  <-- No need, certificate is read straight from the Exchange servers.
    default_backend be_exchange_imaps
 
backend be_exchange_imaps
    mode tcp
    option tcplog
#   balance roundrobin
    balance leastconn
    option log-health-checks
#   stick store-request src
#   stick-table type ip size 200k expire 30m
#   option tcp-check
#   tcp-check connect port 143
#   tcp-check expect string * OK
#   tcp-check connect port 993 ssl
#   tcp-check expect string * OK
    server exchange1 1.1.1.1:993 weight 10 check
    server exchange2 2.2.2.2:993 weight 20 check

 

 

 

Explanations:

For obvious reasons, all IP addresses are censored.

SSL tuning / hardening section was added by me for additional security. As said in many blog posts before, I care about server security. More information regarding this can be found at: http://www.mattzuba.com/2015/07/hardening-haproxy-for-an-a-rating/ for example. At the same time the http-response set-header X-Frame-Options SAMEORIGIN and http-response set-header X-Content-Type-Options nosniff options were added. I still left the option # http-response set-header Strict-Transport-Security max-age=31536000;\ includeSubdomains;\ preload commented though. These options are required when aiming for a A+ score on Qualys SSL Labs SSL Server test.

Btw, HAProxy can also protect you from DDOS attacks and other attacks. Have a look at

https://www.haproxy.com/blog/use-a-load-balancer-as-a-first-row-of-defense-against-ddos/
https://www.loadbalancer.org/blog/simple-denial-of-service-dos-attack-mitigation-using-haproxy-2/

for more information.

 

Stats section was added to get some nice stats. You can uncomment the the line  #option contstats to get continuous statistics. At the moment it require a manual webpage refresh.

I added http to https redirect on the main front-end with the line: redirect scheme https code 301 if !{ ssl_fc }

I added new ACL’s so the back-ends would fetch every combination of Autodiscover and so forth:

acl autodiscover url_beg /Autodiscover
acl autodiscover url_beg /autodiscover

acl owa url_beg /owa
acl owa url_beg /OWA

acl ews url_beg /EWS
acl ews url_beg /ews

 

I added a SMTP and IMAP section. We only need port 25 (smtp) to be open, as Postfix is taking care of sending the emails. Exchange only hands them over to Postfix (send connector). IMAPs on the other hand is open to the world, even though we don’t have many users accessing Exchange using imap. (Unencrypted imap isn’t allowed at all). You can use different techniques for configuring imaps (available in comments), but I’ve chosen to use the easiest one. Also note that SMTP and IMAP are using Layer 4 (tcp) load balancing, opposite to Exchanges own protocols. That said, you can still get the imap client IP’s straight from haproxy’s log file (instead of IIS on Exchange).

 

 

Problem solving

Apart from mega problems with Mac (sigh), we also had a hiccup with non-domain joined Windows clients. I had to scratch my head quite a bit before finding the reason why a client would prompt for credentials even though the certificate was correctly applied in HAProxy. Well, it turned out to be a certificate problem after all, and Zoltan briefly even described it in Part 5 of his blog:

By the way, even if you acquire a certificate from a commercial CA, there are no guarantees that its intermediate and root CA certificates will come pre-canned with CentOS 7 (or any Linux flavour as a matter of fact), so it is good to know the procedure anyway, just in case”.

I happened to miss this part though, so I was very confused. At the moment I don’t even remember what pointed me in the right direction, but the solution was indeed to add the intermediate certificate from our certificate provider into the .pem file created earlier:

cat /etc/pki/ca-trust/source/DigiCertCA.crt  > /etc/ssl/certs/exchange_certificate_and_key_nopassword.pem

DigiCertCA is automatically deployed to our Linux servers with puppet, the “only” problem was that HAProxy didn’t understand how to use it without the above trick. More information: https://www.happyassassin.net/2015/01/14/trusting-additional-cas-in-fedora-rhel-centos-dont-append-to-etcpkitlscertsca-bundle-crt-or-etcpkitlscert-pem/. Well, it now makes much more sense why it worked on domain joined Windows clients but not on other clients – the domain joined Windows clients were automatically receiving the intermediate certificate via group policies. Oh well. Live and learn 🙂

Finally, the BIGGEST problem of them all, was Outlook for Mac. What a pain in the ass. Even with the above intermediate certificate in place it wouldn’t work. I was completely clueless, and it seemed the whole Internet was equally confused. (I asked in forums for a solution to my Mac dilemma, without success). What helped me in the end, was a colleague noticing that Outlook for Mac is connecting to Exchange using old protocols. Even the newest Outlook for Mac 2016. SHAME ON YOU MICROSOFT! This however opened up a new world of googling for me. That said, here’s some information regarding the dilemma (which ISN’T a HAProxy problem btw):

https://support.microsoft.com/en-ph/help/2955530/outlook-for-mac-clients-cannot-connect-to-exchange-server
https://outlook.uservoice.com/forums/293343-outlook-for-mac/suggestions/15120408-add-support-for-tls-1-0-1-1-and-1-2
https://www.quora.com/Is-there-a-way-to-connect-Mac-Office-Outlook-2016-with-Microsoft-Exchange-Server-2007
https://answers.microsoft.com/en-us/mac/forum/macoffice2011-macoutlook/outlook-2011-to-use-sslv3/7e777e6b-9e92-4a89-8874-d357c4bdf6ef?auth=1

and the solution:

https://support.microsoft.com/en-us/help/980436/ms10-049-vulnerabilities-in-schannel-could-allow-remote-code-execution

haproxy_allowInsecureRenegotiation

In other words, edit the registry on the Exchange servers to enable Compatible mode. And there you have it, all Mac clients can finally connect through HAProxy 🙂

 

 

Final thoughts

As said before, HAProxy’s logging mechanism is amazing. It does produce quite huge logs though – hundreds of MB’s per day. Luckily the logs are very nicely compressed, and the daily log rotated log takes up less than 100MB HDD space. That said, be sure allocate enough disk space for the logs! Analyzing the logs in depth can be a little bit cryptic, but luckily there’s documentation available. For example, check out

https://cdn.haproxy.com/wp-content/uploads/2017/07/aloha_load_balancer_memo_log.pdf for an amazing chart with all the log value codes and explanations..

The termination flags can be found in HAProxy’s own documentation: https://cbonte.github.io/haproxy-dconv/1.5/configuration.html#4-option%20dontlognull. Search for the text “The most common termination flags combinations are indicated below. They are alphabetically sorted, with the lowercase set just after the upper case for easier finding and understanding” and you’re good to go.

Here’s a random screenshot from haproxy.log:

haproxy_log_example_output

 

You’ll now get all the client IP’s from these logs. For obvious reasons they’re censored in the screenshot though. Furthermore, (and perhaps even more important), you can now also check the IIS logs on the Exchange servers and notice that a column (the last one) has been added for x-forward-for:

2018-06-30 23:59:42 x.x.x.x POST /mapi/emsmdb/ MailboxId=ce9fb341-8f0b-4315-b3d9-3e77591e0a18@abo.fi&CorrelationID=<empty>;&ClientId=I9XPTDBZEATAAVKDG&cafeReqId=d3cdb726-44ae-4237-9bbc-2c5e5f434e13; 443 – x.x.x Microsoft+Office/16.0+(Windows+NT+10.0;+Microsoft+Outlook+16.0.4639;+Pro) - 401 2 5 15 1.2.3.148
2018-06-30 23:59:42 x.x.x.x POST /mapi/emsmdb/ MailboxId=57c56614-6647-470d-a620-f3b1f5e2dc8f@abo.fi&CorrelationID=<empty>;&ClientId=SHDGSCWGKIOLIHQCGSKA&cafeReqId=259f647e-646b-4a62-88c9-5de42df747e7; 443 - x.x.x.x Microsoft+Office/16.0+(Windows+NT+10.0;+Microsoft+Outlook+16.0.4639;+Pro) - 401 2 5 15 1.2.3.149
2018-06-30 23:59:42 x.x.x.x POST /mapi/emsmdb/ MailboxId=43aacf1f-5def-4d3d-9fdb-899ba3c49ec3@abo.fi&CorrelationID=<empty>;&ClientId=IQWDWWOIDUSFPCVQTPW&cafeReqId=4d9bc40c-3d2d-4771-a0bb-024d32ea509c; 443 - x.x.x.x Microsoft+Office/16.0+(Windows+NT+10.0;+Microsoft+Outlook+16.0.4639;+Pro) - 401 2 5 0 1.2.3.30
2018-06-30 23:59:42 x.x.x.x POST /mapi/emsmdb/ MailboxId=57c56614-6647-470d-a620-f3b1f5e2dc8f@abo.fi&CorrelationID=<empty>;&ClientId=SHDGSCWGKIOLIHQCGSKA&cafeReqId=c064ab51-5e49-4c86-9e22-ba7a33ca7223; 443 - x.x.x.x Microsoft+Office/16.0+(Windows+NT+10.0;+Microsoft+Outlook+16.0.4639;+Pro) - 401 1 2148074254 0 1.2.3.14

The last column (“1.2.3.x” ) in the above log snippet is the real client IP (obviously censored yet again).

And there we have it. Logging problem solved! 🙂 Now we can finally start tracking down the bad guys…

Test Lab Guide: Windows Server 2016 with Integrated Exchange 2016, SfB Server 2015 and SharePoint 2016

WARNING! This is a pretty long and detailed blog post 🙂

I decided to upgrade (or actually reinstall) my test lab with the most recent version of Windows Server (including the most recent versions of Exchange, SfB Server and SharePoint). All my server virtual machines are built from a clonedGolden Windows Server Image” in VMware workstation, and I also use the same principle for my clients. This way you can deploy new servers/clients very fast, and they will take up much less disk space compared to installing from scratch.

This “custom” TLG is based on:

Windows Server 2012 R2 Test Lab Guide (including the Basic PKI add-on from here) and
Test Lab Guide: Configure an Integrated Exchange, Lync, and SharePoint Test Lab

with my own Exchange add-ons including:

  • A script for configuring the virtual directories
  • Certificate from domain CA
  • Zevenet Load Balancer (formerly known as Zen)
  • A second server (EX2)
  • Another script for copying the virtual directories from an existing server to a new one
  • Database Availability Group (DAG) between EX1 and EX2
  • Moving a user from one database to another

More about these later on.

I’ll start with an overview of the whole TLG, including my own add-ons:

tlg2016_overview

Fig 1. Test Lab overview – a modified picture from the TLG. (I also configured the Internet subnet (131.107.0.0/24), even though not visible in this picture).

 

The whole project started by following the Windows Server 2012 R2 Test Lab Guide. I then added the Basic PKI infrastructure. These Test Lab Guides were actually “translatable” straight from Windows Server 2012 R2 to Windows Server 2016. I got a “Duplicate IP address error” on one of the servers however, but it was easily solved by following: http://support.huawei.com/enterprise/en/knowledge/KB1000068724. (I have no idea why I got this error (hasn’t happened before), but then again it doesn’t matter now that it was solved).

I then moved over to the Test Lab Guide: Configure an Integrated Exchange, Lync, and SharePoint Test Lab. Step 1 was already done so I moved over to Step 2 and 3 – Installing and configuring a new server named SQL1. Step 3 includes a link to a separate SQL Server 2012 Test Lab Guide, and this TLG also happen to be more or less translatable straight to SQL Server 2016. So yeah, I actually have no further comments about the installation. Step 4 guides you through the Client2-installation, but there’s really nothing to comment about this installation either (pretty basic stuff).

 

Exchange 2016, EX1

It was now time for the Exchange server, EX1. Note to self: Use at least 6GB ram for the VM or the memory will run out. This installation also has a separate guide:

https://social.technet.microsoft.com/wiki/contents/articles/24277.test-lab-guide-install-exchange-server-2013-on-the-windows-2012-r2-base-configuration.aspx.

It’s fine for the most part, however instead of downloading the evaluation version of Exchange I suggest you download the newest Exchange Server 2016 CU instead. This way you’ll get the newest updates from scratch. And yes, all setup files are included in the CU so you can use it as a “clean install”. The prerequisites for Exchange 2016 (on Windows Server 2016) are a bit different compared to Exchange 2013 (on Windows Server 2012 R2) also. The only thing you need to download and install “separately” is Microsoft Unified Communications Managed API 4.0, Core Runtime 64-bit. There’s no need for Microsoft Knowledge Base article KB3206632 if you have a recent/patched version of Windows Server 2016. After that just copy/paste the PowerShell command from the prerequisites page:

Install-WindowsFeature NET-Framework-45-Features, RPC-over-HTTP-proxy, RSAT-Clustering, RSAT-Clustering-CmdInterface, RSAT-Clustering-Mgmt, RSAT-Clustering-PowerShell, Web-Mgmt-Console, WAS-Process-Model, Web-Asp-Net45, Web-Basic-Auth, Web-Client-Auth, Web-Digest-Auth, Web-Dir-Browsing, Web-Dyn-Compression, Web-Http-Errors, Web-Http-Logging, Web-Http-Redirect, Web-Http-Tracing, Web-ISAPI-Ext, Web-ISAPI-Filter, Web-Lgcy-Mgmt-Console, Web-Metabase, Web-Mgmt-Console, Web-Mgmt-Service, Web-Net-Ext45, Web-Request-Monitor, Web-Server, Web-Stat-Compression, Web-Static-Content, Web-Windows-Auth, Web-WMI, Windows-Identity-Foundation, RSAT-ADDS

Then run setup.exe and install Exchange. Use the default options. After completion, follow Step 6: Demonstrate EX1 as an email server in the Exchange TLG. I did NOT try to send an email message from Chris to Janet at this stage though, as I wanted to try this after the Load Balancer was installed. But for now, happy days, Exchange installed!

 

Script for configuring the virtual directories

Usually when installing an Exchange server you change the virtual directories/namespace to something other than the server hostname (default). This namespace is the same name that should be included in the certificate. (I didn’t like the fact that this TLG use self-signed certificates so I added my own subchapter about getting a certificate from a domain CA, see next chapter). In a production environment you should plan the namespace and certificate prior to installation, but in this TLG it doesn’t matter that much. I decided to go with the namespace “exchange.corp.contoso.com”. (Autodiscover (DNS url) should also be included in the certificate (request), which it is by default). Anyhow, I first added the mentioned A records (exchange and autodiscover) to DNS. I pointed them to 10.0.0.11 at this stage (but that will change after the Load Balancer installation). I then changed the virtual directories according to the above plan. For this I used a nice script found from:

https://gallery.technet.microsoft.com/office/Set-all-virtual-directories-f4ec71d3

This script is very nice. The only thing that got me worried was the fact that it tried to change the PowerShell virtual directory. Afaik you shouldn’t change that. Anyway, no big deal, I just answered “no” (seen in screenshot) when the script asked me to change this. Here are a couple of screenshots from the script in action:

tlg2016_set_allvdirs_script1

Fig 2. set-allvdirs.ps1 script

tlg2016_set_allvdirs_script2

Fig 3. set-allvdirs.ps1 script, continued

 

Certificate from domain CA

After all the virtual directories were set, it was time to get a new certificate which reflect the above changes. I headed over to trusty practical 365 to refresh my memory. This time I used EAC when requesting a new certificate. I changed the domains to reflect my newly configured environment. I added exchange.corp.contoso.com and autodiscover.corp.contoso.com and removed all the other hostnames. The other options were pretty basic so nothing special there. I then saved the certificate request and copied it over to my domain CA. However, when I tried to process the certificate request on the CA I was greeted with an error message:

tlg2016_cert_req_error_from_ca

Fig 4. Certificate Request Processor error

A bit of investigation led me to the following url: http://mytechweblog.blogspot.fi/2012/11/the-request-contains-no-certificate.html, which had a solution:

“certreq -submit -attrib “CertificateTemplate: WebServer” WebServerCertReq.txt”

tlg2016_cert_req_manual_submit

Fig 5. Certification request with manual submit.

This solution worked for me, nice! I then saved the .crt file and imported it into Exchange from the same place in EAC where I made the request. However, shortly after this I noticed that EAC and OWA still gave certificate errors. This was strange, but then again nothing new. I had a look in IIS/Bindings, and surely the wrong certificate had been assigned. I corrected this so the newly requested certificate was in use:

tlg2016_cert_assignment_from_IIS_on_ex1

Fig 6. Exchange certificate from domain CA.

 

Zevenet Load Balancer

It was now time to install the Zevenet Load Balancer. The reason for installing the Load Balancer at this stage had to do with the fact that I had now preconfigured all the Exchange virtual directories + autodiscover in DNS. This also meant that it’ll be very easy to point DNS at the Load Balancer instead of the Exchange server/CAS further down the road.

I headed over to https://www.zevenet.com/products/community/ and downloaded the newest version. I installed it following my own old blog post. The main difference this time was that I didn’t bother to use clustered servers. (I already know that it works and we’re using clustered LB’s in production). After installation I did the initial configuration:

tlg2016_zen_new_virtual_network_interface

Fig 7. New virtual network interface. The new VIP was set to 10.0.0.61 (the server IP is 10.0.0.60).

 

I then created a new farm, which listens on port 443 on the newly created virtual network interface IP (VIP):

tlg2016_zen_new_farm

Fig 8. New Farm

 

After this I edited the farm and configured the “real IP”:

tlg2016_zen_edit_real_ip_servers_configuration

Fig 9. Real IP’s. In my case, 10.0.0.11 is the “real” IP for EX1.

 

I then converted the Exchange certificate (in a Linux VM) for use within Zevenet LB:

openssl pkcs12 -in file.pfx -out file.pem -nodes

Source: https://stackoverflow.com/questions/15413646/converting-pfx-to-pem-using-openssl

 

It was then time to import it into Zevenet LB:

tlg2016_zen_manage_certificates

Fig 10. Certificate imported

 

After this I made changes in DNS so that all traffic would go through the Load Balancer:

tlg2016_zen_dns_edited

Fig 11. Editing DNS.

 

Now it was finally time to check that Outlook was working correctly from client1 (or 2):

tlg2016_outlook_connection_status_through_zen

Fig 12. Outlook Connection Status

Well yes, it was. Perfect! 🙂

That quite much summarizes the Load Balancer part. Now moving over to the installation of the second Exchange server, EX2.

 

 

Exchange 2016 – second server, EX2

Now that everything was working as it should with EX1, it was time to add another exchange server to the environment. There were no special notes about this installation, I just followed the same guide as with the first one. One thing that was different however, was the script. I now used a script that could automatically copy the virtual directories from an existing Exchange server during deployment. The script can be found at:

http://www.expta.com/2016/07/new-set-autodiscoverscp-v2-script-is-on.html

I’ll copy/paste some information:

The script is designed to be run during installation. Normally, you would run this script from an existing Exchange server of the same version while the new server is being installed.

That sounded almost too good to be true and I had to try it. That said, I had a test-run from EX1 while EX2 was installing:

tlg2016_set_autodiscover_scp_script1

Fig 13. Set-Autodiscover.ps1 script. Looks promising…

…but it wasn’t:

tlg2016_set_autodiscover_scp_script2

Fig 14. Can’t set the virtual directories.

I had the script running during the whole installation of EX2, but no luck. I suspected that it would be better running the script immediately after the installation instead. That said, I had a go just after the finished installation of EX2:

tlg2016_set_autodiscover_scp_script3

Fig 15. Running the script immediately after the EX2 installation.

Yes, much better this time. All the virt dirs were set within a couple of seconds, and I’d say this “lag” would be fine for a production environment as well. I would also like to “thank” the script for reminding me to install a certificate on this second server. That said, I opened up EAC and chose the new EX2 server from the pull-down menu under certificates. I then chose “import” and used the same certificate I made for EX1. It got imported nicely:

tlg2016_imported_cert_ex2

Fig 16. Imported domain certificate

 

Be sure to enable all needed services on the newly imported certificate also:

tlg2016_setting_services_on_imported_certificate_ex2

Fig 17. Overwriting existing SMTP certificate. At the same time I also chose to enable the IIS, POP and IMAP services.

 

Checking certificates from EMS:

tlg2016_checking_certs_from_ems

Fig 18. Checking active certificates. Looks good!

 

…and while you’re at it, check that OWA won’t give you certificate errors:

tlg2016_checking_that_cert_ís_ok_from_owa

Fig 19. OWA

It doesn’t. All good! (The new certificate wasn’t yet active In Fig 16, therefore the status url bar is red).

Only thing left to do now was to add this second IP (10.0.0.12) to “real servers” in Zevenet Load Balancer. After this change, the “dual exchange server setup” was ready for use.

 

DAG

With both EX1 and EX2 up ‘n running, it was time to configure a Database Availability Group (DAG) between the servers. I’ve done this many times before, and I’ve always used the same guide whether it’s for Exchange 2013 or Exchange 2016. The guide I’ve used is:

https://practical365.com/exchange-server/installing-an-exchange-server-2013-database-availability-group/

It’s very straight forward without any extra bs. Some notes:

  • I’m using the “SP1 (APP1)” server as the witness server.
  • I pre-staged a computer account in AD named “EXDAG”
  • I did not configure a dedicated replication network. (Overkill for a test lab).
  • I did not move the Default Mailbox Databases from the default folder path onto storage volumes dedicated to databases and transaction log files. (Again, a little overkill for a test lab).

tlg2016_manage_db_availability_group_membership

Fig 20. Manage Database Availability Membership

 

After this step was done I configured database copies following yet another good (and familiar) follow-up guide from the same series:

https://practical365.com/exchange-server/exchange-2013-dag-database-copies/

I’ve got no additional comments about the database copies, they work just as intended/written in the guide 🙂 Below you’ll find some related screenshots:

tlg2016_add_mailbox_db_copy

Fig 21. Add Mailbox Database Copy

 

tlg2016_databases_overview_after_db_copy_and_dag

Fig 22. Database overview with database copies.

 

Moving users

I moved the user “Janet” from the original database on EX1 over to the database on EX2. This way I “spread out” my (two 🙂 ) users so their mailboxes are situated on different servers. This is good for failover testing and so forth.

tlg2016_move_user_janet_to_ex2

Fig 23. Moving Janet to another database (server).

 

tlg2016_move_user_janet_to_ex2_2

Fig 24. Moving Janet to another database (server), continued.

 

The above steps now completes the whole Exchange-part of the TLG.

 

 

Skype for Business Server 2015, LYNC1

It was now time to move over to the Lync-part of the TLG. The first change was actually the software itself – I’m installing Skype for Business Server 2015 instead of Lync Server 2013. As with other software in this lab, the prerequisites are way different for SfB Server compared to Lync Server. I used a combination of

https://blogs.perficient.com/microsoft/2017/08/skype-for-business-how-to-install-on-windows-server-2016/ and
http://www.garethjones294.com/install-skype-for-business-server-2015-on-windows-server-2016-step-by-step/

as a base for my deployment. Some additional notes:

  • Note to self: Use at least 3GB ram for the VM.
  • (Newest) Cumulative Update has to be installed, otherwise SfB Server won’t work at all on Windows Server 2016.
  • As I installed SfB Server in an isolated network (no internet access), I also had to define the source (which is the Windows Server 2016 DVD) in the PowerShell prerequisite command:

          tlg2016_sfb_install_prereq_from_powershell

          Fig 25. Prerequisites installation for SfB Server 2015 on Windows Server 2016.

 

I then continued following the TLG guide again, and moved over to the chapter “To prepare Active Directory”. Some notes:

  • Installed newest version of offline-Silverlight manually.
  • Chose not to check for updates.
  • Added the DNS SRV records, but they didn’t work when I tested them (probably outdated info in the TLG). This was no big deal, as lyncdiscoverinternal can be used instead for example. You could also Google for updated information, but I didn’t feel it was necessary for this TLG.
  • Everything went fine until “21. From the Topology Builder Action menu, select Publish Topology.” I was greeted with:

          tlg2016_sfb_publishing_topology

          Fig 26. Publishing Topology error.

          tlg2016_sfb_publishing_topology_error

          Fig 27. Publishing Topology error, continued

Well, after some investigation (googling), it turned out this just had to do with UAC: http://terenceluk.blogspot.fi/2013/03/publishing-new-lync-server-2013.html. Surely, after running the deployment wizard again as an administrator (run as), it worked!

 

I now moved over to the “To install Lync Server 2013 core components” -part of the TLG. Notes:

  • I was only running step 1 and 2 at this stage.
  • The IIS URL Rewrite Module problem was well known, I’ve even blogged about it.
  • After step 2 was done, it was time to install the newest CU for SfB, otherwise SfB Server won’t run at all on Windows Server 2016.
    • Remember to run the SfB Management Shell as an Administrator.
    • Everything went smoothly with the CU installation!
  • I moved over to Step 3 – Deployment Wizard/SfB Server 2015 Core Components.
    • Everything went fine!
  • Step 4 was different for SfB compared to Lync. You can’t start services from the Deployment Wizard in SfB Server 2015.
    • Instead, you start them from the SfB Management Shell with the command “Start-CsWindowsService
    • The command didn’t run as planned though:

                tlg2016_sfb_start-cswindowsservices_error

                 Fig 28. SfB Server 2015 Deployment Log.

    • I tried to manually start the “Skype for Business Front-end” service from “Services” in Windows.
      • Did not work either, got stuck in “starting…”
      • Tried old school method and rebooted the server.
        • Worked, all services were now up ‘n running after reboot 🙂
  • I moved over to the “To enable users in the Lync Server Control Panel”-part of the TLG and enabled the users.
  • Yep, all done, working! 🙂

 

 

SharePoint 2016, SP1

SharePoint was the last (and the “easiest”) software candidate on the list. Yet again the prerequisites were different compared to the 2013 version of the TLG. My notes:

  • SP1 is the server name, not Service Pack 1 🙂
  • I tried various offline methods for the prerequisite installation. What a headache. Spare yourself the pain and DO NOT try to install the prerequisites without an active internet connection. I repeat, DO NOT try it.
  • I then installed the prerequisites with “Install software prerequisites” from default.hta. Everything went smoothly.
  • I continued following the TLG and the “To prepare DC1 and SQL1 –part. Nothing to add or comment here.
  • I continued following the TLG and the “To install SharePoint Server 2013” –part. Nothing to add or comment here.
  • Happy days, SP1 installed!

 

 

Configure integration between EX1, LYNC1, and SP1

As a last step in this TLG, I configured server integration between the servers. I would advise you to stay away from the TLG script and use newer information instead. The script has failed me before, and surely it failed this time also when I tried it. In other words, skip the script.

As a first step though, check the SP1/APP1 certificate. The TLG tells you to add a https site binding and select the certificate with the name sp1.corp.contoso.com. This won’t work, at least not for me (never has). Instead, when creating the new https binding, choose the certificate that has been issued to the SP1/APP1 server (never mind the confusing “friendly” name):

tlg2016_sp_checking_cert

Fig 29. Checking SSL certificate in SharePoint/IIS

I got a warning about the certificate already being used for the Default Web Site, but this can be ignored (at least in this TLG).

 

Now we’re ready to move over to some “fresh” information about integration. For starters, have a look at:

Exchange <-> SfB: http://lyncdude.com/2015/10/06/the-complete-skype-for-business-exchange-2016-integration-guide-part-i/index.html
Exchange –> SharePoint and SfB: https://technet.microsoft.com/en-us/library/jj649094(v=exchg.160).aspx
SfB –> SharePoint: https://technet.microsoft.com/en-us/library/jj204975.aspx
SharePoint –> Exchange: https://technet.microsoft.com/en-us/library/jj655399.aspx
SharePoint –> SfB: https://technet.microsoft.com/en-us/library/jj670179.aspx

All links are compatible with the 2016 versions also. Here are the results from my own environment:

 

Skype for Business:

tlg2016_oauth_sfb_check_current_cert

Fig 30. Checking current OAuth certificate and OAuth configuration.

tlg2016_oauth_sfb_setting_cs_auth_configuration

Fig 31. Setting OAuth configuration and checking the configuration.

 

tlg2016_oauth_sfb_to_ex

Fig 32. SfB –> Exchange integration

tlg2016_oauth_sfb_to_sp

Fig 33. SfB –> SharePoint integration

 

tlg2016_oauth_sfb_to_ex_and_sp_check

Fig 34. Checking partner applications. Both Exchange and SharePoint are integration partners.

 

Exchange:

tlg2016_oauth_ex_to_sfb

Fig 35. Exchange –> SfB integration

tlg2016_oauth_ex_to_sp

Fig 36. Exchange –> SharePoint integration

 

tlg2016_oauth_ex_to_sfb_and_sp_checking

Fig 37. Checking partner applications. Both SfB (Lync) and SharePoint are integration partners.

 

SharePoint:

tlg2016_oauth_sp_to_ex

Fig 38. SharePoint –> Exchange integration

tlg2016_oauth_sp_to_sfb

Fig 39. SharePoint –> SfB integration

 

tlg2016_oauth_sp_to_ex_and_sfb_checking

Fig 40. Checking partner applications. Both Exchange and SfB are integration partners.

 

The integration chapter above now finalizes this whole TLG. It was a fun project and I hope someone will find this information useful.

Changing Outlook Connectivity towards MAPI over HTTP

We had a big Microsoft Office 2016 upgrade project this fall. The main reason for this was a non-consistent Office environment with versions dating all the way back to MS Office 2007. The whole upgrade process was done using a System Center Configuration Manger application package, which I also happen to be the author for. The upgrade process went mostly fine, even though it was quite a long process involving many (stubborn) users and computers.

Anyhow, now that a new MS Office version was deployed, it was finally time to think about changing the Outlook protocol and authentication to a more modern one. RPC over HTTP (and basic/ntlm authentication) was getting dated, and the main reason for not using MAPI over HTTP were the old clients. That problem is now gone 🙂

So, what’s up with this MAPI-thing then? I’ll just copy/paste a few things and you can read more about MAPI in the provided links.

“MAPI over HTTP is a new transport used to connect Outlook and Exchange. MAPI/HTTP was first delivered with Exchange 2013 SP1 and Outlook 2013 SP1 and begins gradually rolling out in Office 365 in May. It is the long term replacement for RPC over HTTP connectivity (commonly referred to as Outlook Anywhere). MAPI/HTTP removes the complexity of Outlook Anywhere’s dependency on the legacy RPC technology”.

“The primary goal of MAPI/HTTP is provide a better user experience across all types of connections by providing faster connection times to Exchange – yes, getting email to users faster. Additionally MAPI/HTTP will improve the connection resiliency when the network drops packets in transit. Let’s quantify a few of these improvements your users can expect. These results represent what we have seen in our own internal Microsoft user testing.”

Source: https://blogs.technet.microsoft.com/exchange/2014/05/09/outlook-connectivity-with-mapi-over-http/

MAPI over HTTP is also the default protocol in Exchange 2016 which clearly shows the way Microsoft is going. More information about MAPI:

https://technet.microsoft.com/en-us/library/dn635177(v=exchg.150).aspx
http://markgossa.blogspot.fi/2015/11/exchange-2013-and-exchange-2016-mapi.html
http://searchexchange.techtarget.com/definition/MAPI-over-HTTP-Messaging-Application-Programming-Interface-over-HTTP

What are the requirements for MAPI then?

 

Main Prerequisites

Complete the following steps to prepare the clients and servers to support MAPI over HTTP.

  1. Upgrade Outlook clients to Outlook 2013 SP1 or Outlook 2010 SP2 and updates KB2956191 and KB2965295 (April 14, 2015).

  2. Upgrade Client Access and Mailbox servers to the latest Exchange 2013 cumulative update (CU). For information about how to upgrade, see Upgrade Exchange 2013 to the latest cumulative update or service pack.

Source: https://technet.microsoft.com/en-us/library/dn635177(v=exchg.150).aspx

No problem, these prerequisites were now in order for us.

 

Other Prerequisites

Check/Set the MAPI Virtual Directories

Clients can’t connect if you don’t have working MAPI Virtual Directories. Just follow the above TechNet article and you’ll be fine. Our original MAPI virtual directories looked like this:

exchange_mapi_virt_dirs_before_change

Sorry for the blur. All I can say is that the InternalUrl had the Exchange server’s hostname specified instead of the (single) namespace. Changing and verifying the new url’s is done in the following screenshot:

exchange_mapi_virt_dirs_changing_urls

I changed the ExternalUrl on one of our Exchange servers. I then used the same command for the InternalUrl, replacing the word “External” with “Internal”. I also made the same change to our second Exchange server. The end the result would have both external and internal url’s listed, like so:

exchange_mapi_virt_dirs_after_change

Again, sorry for the blur. We’re using a single namespace so all url’s are basically identical.

 

Enable MAPI over HTTP in your Exchange Organization

This is easily done with one command:

Set-OrganizationConfig -MapiHttpEnabled $true

In theory, this isn’t needed if the user mailboxes already have MAPI enabled:

“To enable or disable MAPI over HTTP at the mailbox level, use the set-Casmailbox cmdlet with the MapiHttpEnabled parameter. The default value is Null, which means the mailbox will follow organization-level settings. The other options are True to enable MAPI over HTTP and False to disable. In either case, the setting would override any organization-level settings. If MAPI over HTTP is enabled at the organization level but disabled for a mailbox, that mailbox will use Outlook Anywhere connections”.

Source: https://technet.microsoft.com/en-us/library/mt634322(v=exchg.160).aspx

Checking if enabled on user level (we had it enabled already):

exchange_checking_mapi_user_mailbox

That said, it’s still a good idea to enable it organization wide (if you have old migrated room mailboxes or public folders). See the following link:

https://social.technet.microsoft.com/Forums/ie/en-US/f4df02af-20cd-423c-8c59-1ea563b7b940/exchange-2013-mapihttp-public-folder-and-room-mailboxes-still-using-rpchttp?forum=exchangesvrdeploy

We enabled it organization wide and It was now time to do some tests.

 

Testing

Still following the TechNet article:

exchange_checking_mapi_selftest

All good 🙂

The change isn’t instant. A recycle of the MSExchangeAutodiscoverAppPool make things happen faster however. After a MSExchangeAutodiscoverAppPool recycle and a coffee break plus a restart of my own Outlook client, I had a look at the Outlook Connection Status:

exchange_outlook_connection_status_with_mapi

Well, well, well. All connections are now using MAPI over HTTP instead of the old RPC over HTTP.

You’ll see several changes:

  • First change is in the “Server name” column. It’s now a real server name instead of a mailbox GUID.
    • Notice also that the server path includes /mapi/.
  • The protocol (column) has changed from RPC/HTTP to just HTTP.
  • The “Authn” column have changed values from NTLM to Nego* (Kerberos)

 

Further check-ups

Now that we’re using up2date clients, there’s no need to use old authentication methods for Outlook Anywhere either. (Outlook Anywhere authentication and MAPI authentication are configured/can be configured separately btw). The main reason for sticking with the old Basic/NTLM authentication (at least externally) is when using an Exchange 2010/2013 co-existence environment (and sticking with old “best practices”). This hasn’t been the case for us for a long time. Another reason for still sticking with “Basic” is, well, old clients. A third reason would be a mix of both old clients and an Exchange 2010/2013 co-existence environment. Luckily for us, we now have a “pure” Exchange 2013 environment and (almost) all Windows clients are using Outlook 2016.

Even though Outlook clients are now configured to connect via the MAPI protocol, there will still be some clients connecting via RPC over HTTP. This will probably be the case for quite some time, as the clients are gradually moving towards MAPI. (The change isn’t instant as already stated). Many users don’t restart their Outlook client so often either, which in turn means that they’re using RPC over HTTP until restart (at least from my experience). Correct me if I’m wrong.

As a side note, the MAPI protocol is only used for Windows Outlook clients. The Mac version of Outlook use EWS, and mobile phones use ActiveSync. So yes, you’ll still have many different active protocols in your organization. Don’t disable them 🙂

If you want to read more about the Outlook Anywhere authentication types, have a look at http://msexchangeguru.com/2013/01/10/e2013-outlook-anywhere/ for example.

A checkup of our old settings for Outlook Anywhere:

exchange_checking_outlook_anywhere_auth_methods

As you can see, the servers were still using Basic for external authentication method. Time to change that! This time I’ll use EAC:

exchange_outlook_anywhere_changing_external_auth_from_eac

You’ll get a warning that you shouldn’t enable Negotiate if you’re having Exchange servers older than version 2013 in your environment. Again, no problem!

I did the same for the internal url (must be done from EMS):

exchange_outlook_anywhere_changing_internal_auth_from_ems

Like so. All authentication methods pimped to modern standards 🙂

 

This should cause no problems at all (and still haven’t). A good referral:

For Exchange 2007/2010

Set-OutlookAnywhere -Identity ‘SERVER\Rpc (Default Web Site)’ -SSLOffloading $true -ClientAuthenticationMethod NTLM -IISAuthenticationMethods Basic,NTLM

For Exchange 2013+ with backwards compatibility with Outlook 2010 and 2007

Set-OutlookAnywhere -Identity ‘SERVER\Rpc (Default Web Site)’ -SSLOffloading $true -ExternalClientAuthenticationMethod NTLM -InternalClientAuthenticationMethod NTLM -IISAuthenticationMethods Basic,NTLM,Negotiate

For Exchange 2013+ with Outlook 2013+

Set-OutlookAnywhere -Identity ‘SERVER\Rpc (Default Web Site)’ -SSLOffloading $true -ExternalClientAuthenticationMethod Negotiate -InternalClientAuthenticationMethod Negotiate -IISAuthenticationMethods Basic,NTLM,Negotiate

Source: https://community.spiceworks.com/topic/1979514-outlook-auto-discover-issue

 

Lastly, if you’re having trouble with the famous and dreadful Outlook credential-popup, changing the authentication methods should help.

“Basic authentication: If you select this authentication type, Outlook will prompt for username and password while attempting a connection with Exchange”.

Source: http://msexchangeguru.com/2013/01/10/e2013-outlook-anywhere/ 

Similar information can be found at:

http://www.sysadminlab.net/exchange/outlook-anywhere-basic-vs-ntlm-authentication-explained
https://community.spiceworks.com/topic/1979514-outlook-auto-discover-issue

for example. I’ve seen this mentioned on other sites as well, I just can’t remember them right now. We’ve also seen the “popup dilemma” here, but luckily gotten rid of it by now.

 

Get-ActiveExchangeUsers script

If you want to get some nice details about how your users connect (per protocol), I’d suggest you grab the script from

https://gallery.technet.microsoft.com/office/Get-ActiveExchangeUsersps1-f1642024

As seen in my screenshots below, not everyone is connected over MAPI (yet). Old RPC connections are still used, and that will probably be the situation for a while. If someone wiser than me care to explain why this is the case, please do.

Get-ActiveExchangeUsers_script_server1

Get-ActiveExchangeUsers_script_server2

The user activity was fairly low when these screenshots were taken.

 

Checking logs

Finally, if you feel like deep diving into the logs, the MAPI-stuff gets logged in:

C:\Program Files\Microsoft\Exchange Server\V15\Logging\MAPI Client Access for the CAS and
C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\Mapi for the Mailbox.

If you see any suspicious things in here, do Google (or use common sense 🙂 ).

Moving SpamAssassin’s Spam-flagged Mail Automatically into Outlook’s Junk E-mail Folder

More and more of our users are moving towards a pure Exchange environment. This means that both email and calendar functions are used from Exchange. This is the default behavior for most companies nowadays, but we’ve been using a 3rd party imap server for email. The (slow) migration towards Exchange also brings challenges. One of many challenges is regarding the spam management (on the client). Should it be an automatic or a manual process for the users?

On imap, we have a postfix server in front of the imap server. The postfix server is configured to take care of spam and antivirus using SpamAssassin+Amavis. (Postfix routes email via the SpamAssassin/Amavis server before it “lands” on the imap (and Exchange) “back-ends”). This is still true for Exchange – which is a good thing. There’s no need to install (or buy a license for) a separate Exchange Edge Transport server to handle spam and antivirus when we already have a VERY efficient Spam/AV-solution deployed.

However, Exchange have no idea what to do with these spam-flagged emails by default – it’s delivering them straight into Outlook’s inbox. Here’s an example of a spam-flagged email (without Exchange server-side filters/transport rules):

Exchange_spamexample_without_filters

Fig 1. A Spam-flagged email (by SpamAssassin) from a user inbox. No server-side configuration done yet.

Even though this email is already flagged as spam (***Spam***), it’s not very “nice” having them arrive in your inbox. What you want is an automated process/feature which moves spam automatically into Outlook’s “Junk E-mail”-folder.

On a side note, not all spam in our organization reaches the users. If an email has a spam score over 12, it won’t be delivered to a user mailbox. A spam score between 5-11 is classified as spam, so these emails will be delivered to the users however. Of course emails with a score of 5 or lower will also be delivered, but these emails aren’t flagged as spam 🙂 If you want to get technical and read about spam scores in SpamAssassin, go ahead:

https://en.wikipedia.org/wiki/SpamAssassin
https://litmus.com/help/testing/spam-scoring/
https://help.aweber.com/hc/en-us/articles/204032316-What-is-a-good-Spam-Assassin-Score

 

Well, how do you automatically sort the spam emails to the Junk E-mail folder then? You can do this on the client-side (Outlook), but the more logical solution is to filter these on the server-side (Exchange). I followed a good guide from http://exchangepedia.com/blog/2008/01/assigning-scl-to-messages-scanned-by.html and got it working. Some info regarding SCL:

https://technet.microsoft.com/en-us/library/aa995744(v=exchg.150).aspx
https://technet.microsoft.com/en-us/library/dn798345(v=exchg.150).aspx

I hadn’t touched the default values (no need) so they looked like this:

Exchange_contentfilterSCLconfig

Fig 2. Content filtering default configuration.

 

I then followed the guide to make a new Transport Rule. I made it from EAC, but it can easily be created from EMS as well:

Exchange_spam-status-rule

Fig 3. Creating a new Transport Rule.

I made sure that SCL was set to a HIGHER value than the SCLJunkThreshold. In our case “9”. It won’t work otherwise, even though some other guide tells you to put it at a lower value.

 

It’s also a good idea to check that Junk E-mail configuration is enabled on the mailboxes. Here I do a check on my own mailbox:

Exchange_junkemailconfiguration

Fig 4. Check Junk E-mail configuration. It’s enabled.

Good. Everything is in order. Now we should test and see if a spam email arrives in my Junk E-mail folder instead of the Inbox. I spam-bombed myself to test, and yes, the spam arrives correctly in the Junk E-mail folder:

Exchange_spamexample_with_filter

Fig 5. Spam in the Junk.

 

Let’s see what the spam looks like in more detail:

Exchange_spamexample_with_filters

Fig 6. A Spam-flagged email, now with the server-side configuration done.

As you can see, Outlook now provides more information. When a message arrives/is situated in the Junk E-mail folder, it has disabled links and is also converted into plain text.

Happy days – spam is now automatically moved into the Junk E-mail folder for each user 🙂

 

(If you for whatever reason would want to configure this whole procedure on the client side instead of Exchange, you could follow https://www.jyu.fi/itp/en/email/how-to/exchange-spam-filtering for example).

Alternative Witness Server for Exchange 2013 DAG

As stated in my previous blog posts, we’re using a two node DAG spanning across two datacenters/two AD Sites. The problem with this scenario is the witness server, or should I say the location of the witness server. I learned this the hard way, as we had minor problems with one of our datacenters. It also happened to be the datacenter where the witness server is located. This resulted in unresponsive/non-working email for some users, even though the HA aspect of Exchange via the Load Balancer was working fine.

I’ll borrow some pictures and text from http://searchexchange.techtarget.com/tip/How-Dynamic-Quorum-keeps-Exchange-2013-clusters-running-during-failure as I’m too lazy to rewrite everything. (We’re using dynamic quorum by default btw, as our servers are Windows Server 2012 R2).

“In a planned data center shutdown — where power to the data center and a host is often cut cleanly — we would have the opportunity to change the FSW to a host in the secondary data center. This allows for maintenance, but it does not help the inevitable event where an air conditioning unit overheats one weekend, servers begin to shut down, email stops working — and somebody has to get everything up and running again.

Dynamic Quorum with Windows Server 2012 and Exchange 2013 protects not only against this scenario above, but also against scenarios where the majority of nodes in a cluster fail. In another example, we see that in the primary site, we’ve lost both one Exchange node and the FSW (Fig 1). (This happened to us).

In our example, Dynamic Quorum can protect against a data center failure while the Exchange DAG remains online. This means that when the circumstances are right (we’ll come to that in a moment), a power failure in your primary data center can occur and Exchange can continue to stay up and running. This can even happen for smaller environments without the need to place the FSW in a third site.”

exchange_dag_witness1

Fig 1. Loss of the first node and FSW with Dynamic Quorum.

 

The key caveat is that the cluster must shut down cleanly. In the previous example, where the first data center failed, we relied on a mechanism to coordinate data center shutdown. This doesn’t need to be complicated, and a well-designed data center often will have this built in.

This can also protect against another scenario where there are three-node Exchange DAGs in a similar configuration — with two Exchange nodes present in the first data center and a single node present in a second data center. As the two nodes in the first data center shut down cleanly, Dynamic Quorum will ensure the remaining node keeps the DAG online.”

Some similar information can also be found at:

https://practical365.com/exchange-server/windows-server-2012-dynamic-quorum/
http://techgenix.com/exchange-2013-dag-dynamic-quorum-part1/ for example.

 

Well, this would all be too good to be true if it wasn’t for the “the cluster must shut down cleanly” -part. This got me thinking about alternatives. What about a third Exchange server and skipping the witness server altogether? Well, it doesn’t work any better as stated above. It’s the same dilemma if two of the nodes looses power. The solution as I can see it is (briefly) explained in the below article, DAC – Database Activation Coordination mode. This, together with an alternative witness server is the recipe for a better disaster plan. With DAC and an alternative witness server in place, you can force the exchange servers in a AD-Site to connect to the local witness server. It requires some manual work (in case disaster strikes) though, but it’s doable.

 

DAC

So, what’s up with the DAC mode and the alternative witness server? Lets have a look. First, let’s do some homework and have a look at DAC:

https://technet.microsoft.com/en-us/library/dd979790.aspx
https://practical365.com/exchange-server/exchange-best-practices-datacenter-activation-coordination-mode/
https://blogs.technet.microsoft.com/exchange/2011/05/31/exchange-2010-high-availability-misconceptions-addressed/

DAC mode is used to control the database mount on startup behavior of a DAG. This control is designed to prevent split brain from occurring at the database level during a datacenter switchback. Split brain, also known as split brain syndrome, is a condition that results in a database being mounted as an active copy on two members of the same DAG that are unable to communicate with one another. Split brain is prevented using DAC mode, because DAC mode requires DAG members to obtain permission to mount databases before they can be mounted”.

Source: https://technet.microsoft.com/en-us/library/dd979790.aspx

Datacenter Activation Coordination (DAC) mode has nothing whatsoever to do with failover. DAC mode is a property of the DAG that, when enabled, forces starting DAG members to acquire permission from other DAG members in order to mount mailbox databases. DAC mode was created to handle the following basic scenario:

  • You have a DAG extended to two datacenters.
  • You lose the power to your primary datacenter, which also takes out WAN connectivity between your primary and secondary datacenters.
  • Because primary datacenter power will be down for a while, you decide to activate your secondary datacenter and you perform a datacenter switchover.
  • Eventually, power is restored to your primary datacenter, but WAN connectivity between the two datacenters is not yet functional.
  • The DAG members starting up in the primary datacenter cannot communicate with any of the running DAG members in the secondary datacenter”.

Source: https://blogs.technet.microsoft.com/exchange/2011/05/31/exchange-2010-high-availability-misconceptions-addressed/

In short: Enable DAC mode on your Exchange servers if using more than two nodes.

 

Alternative witness server

Now that we have some basic understanding about DAC, let’s look at the Alternative witness server (AWS):

https://www.rutter-net.com/blog/news/alternate-file-share-witness-correcting-the-confusion
https://blogs.technet.microsoft.com/exchange/2011/05/31/exchange-2010-high-availability-misconceptions-addressed/

I think it’s quite well summarized in the first article:

The confusion lies in the event of datacenter activation; that the alternate file share witness would automatically come online as a means to provide quorum to the surviving DAG members and keep the databases mounted. So in many ways, some people view it as redundancy to the file share witness for an even numbered DAG.

In reality, the alternate file share witness is only invoked when an admin goes through procedures of activating the mailbox servers who lost quorum. DAC mode dramatically simplifies the process and when the “Restore-DatabaseAvailabilityGroup” cmdlet is executed during a datacenter activation, the alternate file share witness will be activated.”

The second article also has some nice overall information about High Availability Misconceptions. I suggest you read it.

In short: Manual labor is required even though you have configured an alternative witness server.

 

Datacenter switchover

So, what to do when disaster strikes? First, have a look at the TechNet article “Datacenter switchovers”:

https://technet.microsoft.com/en-us/library/dd351049.aspx

Then have a look at:

https://smtpport25.wordpress.com/2010/12/10/exchange-2010-dag-local-and-site-drfailover-and-fail-back/ for some serious deep diving into the subject. This has to be one of the most comprehensive articles about DAG/Failover/DAC/you name it on the Internet.

I’ll summarize the TechNet and the smtpport25 articles into actions:

From TechNet:

“There are four basic steps that you complete to perform a datacenter switchover, after making the initial decision to activate the second datacenter:

  1. Terminate a partially running datacenter   This step involves terminating Exchange services in the primary datacenter, if any services are still running. This is particularly important for the Mailbox server role because it uses an active/passive high availability model. If services in a partially failed datacenter aren’t stopped, it’s possible for problems from the partially failed datacenter to negatively affect the services during a switchover back to the primary datacenter”.

The sub-chapter Terminating a Partially Failed Datacenter has details on how to do this, and smtpport25 has even more information. If you start reading from “Figure 19” onwards in the smtpport25 article you’ll find this:

In figure 20. Marked in red has the details about started mailbox servers and Stopped Mailbox Servers. Started mailbox servers are the servers which are available for DAG for bringing the Database online. Stopped mailbox Servers are no longer participating in the DAG. There may be servers which are offline or down because of Datacenter failures. When we are restoring the service on secondary site, ideally all the servers which are in primary should be marked as stopped and they should not use when the services are brought online”.

So, in other words we should move the primary servers into Stopped State. To do that, use the PowerShell command:

Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver AMBX1 –Configurationonly

Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver AMBX2 –Configurationonly

Source: https://smtpport25.wordpress.com/2010/12/10/exchange-2010-dag-local-and-site-drfailover-and-fail-back/

Then, TechNet and smtpport25 have different information:

TechNet tells you to:

“2.The second datacenter must now be updated to represent which primary datacenter servers are stopped. This is done by running the same Stop-DatabaseAvailabilityGroup command with the ConfigurationOnly parameter using the same ActiveDirectorySite parameter and specifying the name of the Active Directory site in the failed primary datacenter. The purpose of this step is to inform the servers in the second datacenter about which mailbox servers are available to use when restoring service”.

The above should be enough if the DAG is in DAC mode (which it is).

Smtpport25 however doesn’t mention DAC mode at all in this case, instead they use the non-DAC mode approach from TechNet, with a little twist:

  • First, stop the cluster service on the secondary site/datacenter, Net stop Clussvc
  • Then, restore DAG on the secondary site, Restore-DatabaseAvailabilityGroup -Identity DAG01 -ActiveDirectorySite BSite

I honestly don’t know which of the solutions are correct, and I hope I won’t have to find out in our production environment anytime soon 🙂

 

Next step would be to Activate the Mailboxes Servers, again following different information whether the DAG is in DAC mode or not. I won’t paste all the text here as it is available in the TechNet article.

Then, following on to the chapter Activating Client Access Services:

  • Activate Client Access services   This involves using the URL mapping information and the Domain Name System (DNS) change methodology to perform all required DNS updates. The mapping information describes what DNS changes to perform. The amount of time required to complete the update depends on the methodology used and the Time to Live (TTL) settings on the DNS record (and whether the deployment’s infrastructure honors the TTL).

We do not need to perform this step as we’re using Zen Load Balancer 🙂

And lastly, I won’t copy/paste information regarding Restoring Service to the Primary Datacenter, it’s already nicely written in the TechNet or smtpport25 article. I sure do hope I won’t have to use the commands though 🙂

Health Checking / Monitoring Exchange Server 2013/2016

I‘ve never wrote about monitoring / health checking before so here we go. There are maaaaany different ways of monitoring servers, so I’ll just present my way of monitoring things (in the Exchange environment). If you’re using SCOM or Nagios, you’re already halfway there. The basic checks in SCOM or Nagios will warn you about low disk space and high CPU load and so forth. But what if a Exchange service or a DAG renders errors for example? Exchange is a massive beast to master, so in the end you’ll need decent monitoring tools to make your life easier (before disaster strikes).

We’re using Nagios for basic monitoring which is working great. That said, from time to time I’ve noticed some small problems that Nagios won’t report. I have then resorted to PowerShell commands or the windows event log. These problems would probably have been noticed (in time) if we had decent/additional Exchange-specific monitoring in place. There are Exchange-plugins available for Nagios (I’ve tried a few), but they aren’t as sophisticated as custom PowerShell scripts made by “Exchange experts”. It’s also much easier running a script from Task Scheduler than configuring Nagios. At least that’s my opinion.

Anyhow, our monitoring/health checking consist of three scripts, namely:

add-pssnapin *exchange* -erroraction SilentlyContinue
$body=Get-HealthReport -Server “yourserver” | where {$_.alertvalue -ne “Healthy” -and $_.AlertValue -ne “Disabled”} | Format-Table -Wrap -AutoSize; Send-MailMessage -To “me@ourdomain.com” -From “HealthSetReport@yourserver” -Subject “HealthSetReport, yourserver” -Body ($body | out-string ) -SmtpServer yoursmtpserver.domain.com

 

I have all of these scripts set up as scheduled tasks. You’d think that setting up a scheduled task is easy. Well, not in my opinion. I had to try many different techniques but at least it’s working now.

For Paul’s script I’m using the following settings:

ExServerHealth_schedtask_general

  • For “Triggers” I’m using 02:00 daily.
  • For “Actions” I’m using:
    • Start a program: powershell.exe
    • Add arguments (optional): -NoProfile -ExecutionPolicy Bypass -File “G:\software\scripts\Test-ExchangeServerHealth.ps1” –SendEmail
    • Start in (optional): G:\software\scripts

 

The same method wouldn’t work for Steve’s script though. I used the same “Run with highest privileges” setting, but running the PowerShell command similar to the above wouldn’t work. (This was easily tested running from a cmd promt manually (instead of powershell)). My solution:

  • Triggers: 01:00 every Saturday (yeah, I don’t feel the need to run this every night. Paul’s script will report the most important things anyways).
  • Actions:
    • Start a program: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
    • Add arguments (optional): -NonInteractive -WindowStyle Hidden -command “. ‘C:\Program Files\Microsoft\Exchange Server\V15\bin\RemoteExchange.ps1’; Connect-ExchangeServer -auto; G:\software\scripts\Get-ExchangeEnvironmentReport.ps1 -HTMLReport G:\software\scripts\environreport.html -SendMail:$true -MailFrom:environreport@yourserver -MailTo:me@ourdomain.com -MailServer:yoursmtpserver.domain.com
    • Start in (optional): (empty)

 

My own script:

  • Triggers: 00:00 every Sunday (yeah, I don’t feel the need to run this every night. Paul’s script will report the most important things anyways).
  • Actions:
    • Start a program: powershell.exe
    • Add arguments (optional): -NoProfile -ExecutionPolicy Bypass -File “G:\software\scripts\Get-HealthSetReport.ps1”
    • Start in (optional): G:\software\scripts

 

Sources:

http://practical365.com/exchange-server/powershell-script-exchange-server-health-check-report/ (Paul’s script)
http://www.stevieg.org/2011/06/exchange-environment-report/ (Steve’s script)

https://technet.microsoft.com/en-us/library/jj218724%28v=exchg.160%29.aspx
http://www.msexchange.org/kbase/ExchangeServerTips/ExchangeServer2013/Powershell/scheduling-exchange-powershell-task.html
http://blog.enowsoftware.com/solutions-engine/bid/186014/Introduction-to-Managed-Availability-How-to-Check-Recover-and-Maintain-Your-Exchange-Organization-Part-II

 

Exchange also checks its own health. Let me copy/paste some information:

One of the interesting features of Exchange Server 2013 is the way that Managed Availability communicates the health of individual Client Access protocols (eg OWA, ActiveSync, EWS) by rendering a healthcheck.htm file in each CAS virtual directory. When the protocol is healthy you can see it yourself by navigating to a URL such as https://mail.exchangeserverpro.net/owa/healthcheck.htm.
When the protocol is unhealthy the page is unavailable, and instead of the HTTP 200 result above you will see a “Page not found” or HTTP 404 result instead.

Source: http://practical365.com/exchange-server/testing-exchange-server-2013-client-access-server-health-with-powershell/

Further reading: https://blogs.technet.microsoft.com/exchange/2014/03/05/load-balancing-in-exchange-2013/ and the chapter about Health Probe Checking

We have no need to implement this at the “Exchange server-level” though, as these checks are already done in our Zen Load Balancer (described in this blog post). I guess you could call this the “fourth script” for checking/monitoring server health.

 

Reports

So, what all this adds up to are some nice reports. I’ll get a daily mail (generated at 02:00) looking like this (and hopefully always will 🙂 ):

ExServerHealth_screenshot_from_report

Daily report generated from Test-ExchangeServerHealth.ps1 script

 

ExEnvironmentreport_screenshot_from_report

Weekly report generated from Get-ExchangeEnvironmentReport.ps1 script

 

ExServerHealthSet_screenshot_from_report

Weekly report generated from Get-HealthSetReport.ps1 script (The problem was already taken care of 🙂 )

 

As you can see from the screenshots above, these checks are all reporting different aspects of the Exchange server health.

This quite much covers all the necessary monitoring in my opinion and with these checks in place you can probably also sleep better during the nights 🙂 Even though these checks are comprehensive, I’m still planning on even more checks. My next step will be an attempt at Real-time event log monitoring using NSClient++ / Nagios. Actually the Nagios thought was buried and I will instead focus on the ELK stack which is a VERY comprehensive logging solution.

Using FarmGuardian to enable HA on Back-ends in Zen Load Balancer

UPDATE 13.8.2018: I published a a new blog post about switching from Zen Load Balancer to HAProxy. The time has come to retire Zen for us…

We’ve been using the Zen Load Balancer Community Edition in production for almost a year now and it has been working great. I previously wrote a blog post about installing and configuring Zen, and now it was time to look at the HA aspect of the back-end servers defined in various Zen farms. Zen itself is quite easy to set up in HA-mode. You just configure two separate Zen servers in HA-mode according to Zen’s own documentation. Well, this is very nice and all, and it’s also working as it should. The thing that confused me the most however (until now), is the HA aspect of the back-ends. I somehow thought that If you specify two back-ends in Zen and one of them fail, Zen automatically uses the backend which is working and marked as green (status dot). Well, this isn’t the case. I don’t know if I should blame myself or the poor documentation – or both. Anyways, an example is probably better. Here’s an example of L4xNAT-farms for Exchange (with two back-ends):

zen_farms_table_overview_2017

I guess it’s quite self-explanatory; we’re Load Balancing the “normal” port 443 + imap and smtp. (All the smtp-ports aren’t open to the Internet though, just against our 3rd party smtp server). The http-farm is used for http to https redirection for OWA.

Furthermore, expanding the Exchange-OWAandAutodiscover-farm:

zen_owa_and_autodiscover_farm2017

 

and the monitoring part of the same farm:

zen_owa_and_autodiscover_farm_monitoring2017

 

This clearly shows that the “Load Balancing-part” of Zen is working – the load is evenly distributed. You can also see that the status is green on both back-ends. Fine. Now one would THINK that the status turns RED if a back-end is down and that all traffic would flow through the other server if this happens. Nope. Not happening. I was living in this illusion though 😦 As I said before, this is probably a combination of my own lack of knowledge and poor documentation. Also, afaik there are no clear “rules” for the farm type you should use when building farms. Zen itself (documentation) seem to like l4xnat for almost “everything”. However, if you’re using HTTP-farms, you get HA on the back-ends out-of-the box. (You can specify back-end response timeouts and checks for resurrected back-ends for example). Then again, you’ll also have to use SSL-offloading with the http-farm which is a whole different chapter/challenge when used with Exchange. If you’re using l4xnat you will NOT have HA enabled on the back-ends out-of-the-box and you’ll have to use FarmGuardian instead. Yet another not-so-well-documented feature of Zen.

FarmGuardian “documentation” is available at https://www.zenloadbalancer.com/farmguardian-quick-start/. Have a look for yourself and tell me if it’s obvious how to use FarmGuardian after reading.

Luckily I found a few hits on Google (not that many) that were trying to achieve something similar:

https://sourceforge.net/p/zenloadbalancer/mailman/message/29228868/
https://sourceforge.net/p/zenloadbalancer/mailman/message/32339595/
https://sourceforge.net/p/zenloadbalancer/mailman/message/27781778/
https://sourceforge.net/p/zenloadbalancer/mailman/zenloadbalancer-support/thread/BLU164-W39A7180399A764E10E6183C7280@phx.gbl/

These gave me some ideas. Well, I’ll spare you the pain of googling and instead I’ll present our (working) solution:

zen_owa_and_autodiscover_farm_with_farmguardian_enabled2017

First off, you’ll NEED a working script or command for the check-part. Our solution is actually a script that checks that every virtual directory is up and running on each exchange back-end. If NOT, the “broken” back-end will be put in down-mode and all traffic will instead flow through the other (working) one. I chose 60 sec for the check time, as Outlook times out after one minute by default (if a connection to the exchange server can’t be established). Here’s the script, which is based on a script found at https://gist.github.com/phunehehe/5564090:

zen_farmguardian_script2017

Big thanks to the original script writer and to my workmate which helped me modify the script. Sorry, only available in “screenshot form”.

You can manually test the script by running ./check_multi_utl.sh “yourexchangeserverIP”  from a Zen terminal:

zen_farmguardian_script_manual_testing_from_terminal2017

The (default) scripts in Zen are located in /usr/local/zenloadbalancer/app/libexec btw. This is a good place to stash your own scripts also.

 

You can find the logs in /usr/local/zenloadbalancer/logs. Here’s a screenshot from our log (with everything working):

zen_farmguardian_log2017

 

And lastly I’ll present a couple of screenshots illustrating how it looks when something is NOT OK:

(These screenshots are from my own virtual test environment, I don’t like taking down production servers just for fun 🙂 )

zen_owa_and_autodiscover_farm_monitoring_host_down2017

FarmGuardian will react and present a red status-symbol. In this test, I took down the owa virtual directory on ex2. When the problem is fixed, status will return to normal (green dot).

 

and in the log:

zen_farmguardian_log_when_failing2017

The log will tell you that the host is down.

 

Oh, as a bonus for those of you wondering how to do a http to https redirect in Zen:

zen_http_to_https_redirect2017

Create new HTTP-farm and leave everything as default. Add a new service (name it whatever you want) and then just add the rules for redirection. Yes, it’s actually this simple. At least after you find the documentation 🙂

And there you have it. Both the Zen servers AND the back-ends working in HA-mode. Yay 🙂