Alternative Witness Server for Exchange 2013 DAG

As stated in my previous blog posts, we’re using a two node DAG spanning across two datacenters/two AD Sites. The problem with this scenario is the witness server, or should I say the location of the witness server. I learned this the hard way, as we had minor problems with one of our datacenters. It also happened to be the datacenter where the witness server is located. This resulted in unresponsive/non-working email for some users, even though the HA aspect of Exchange via the Load Balancer was working fine.

I’ll borrow some pictures and text from http://searchexchange.techtarget.com/tip/How-Dynamic-Quorum-keeps-Exchange-2013-clusters-running-during-failure as I’m too lazy to rewrite everything. (We’re using dynamic quorum by default btw, as our servers are Windows Server 2012 R2).

“In a planned data center shutdown — where power to the data center and a host is often cut cleanly — we would have the opportunity to change the FSW to a host in the secondary data center. This allows for maintenance, but it does not help the inevitable event where an air conditioning unit overheats one weekend, servers begin to shut down, email stops working — and somebody has to get everything up and running again.

Dynamic Quorum with Windows Server 2012 and Exchange 2013 protects not only against this scenario above, but also against scenarios where the majority of nodes in a cluster fail. In another example, we see that in the primary site, we’ve lost both one Exchange node and the FSW (Fig 1). (This happened to us).

In our example, Dynamic Quorum can protect against a data center failure while the Exchange DAG remains online. This means that when the circumstances are right (we’ll come to that in a moment), a power failure in your primary data center can occur and Exchange can continue to stay up and running. This can even happen for smaller environments without the need to place the FSW in a third site.”

exchange_dag_witness1

Fig 1. Loss of the first node and FSW with Dynamic Quorum.

 

The key caveat is that the cluster must shut down cleanly. In the previous example, where the first data center failed, we relied on a mechanism to coordinate data center shutdown. This doesn’t need to be complicated, and a well-designed data center often will have this built in.

This can also protect against another scenario where there are three-node Exchange DAGs in a similar configuration — with two Exchange nodes present in the first data center and a single node present in a second data center. As the two nodes in the first data center shut down cleanly, Dynamic Quorum will ensure the remaining node keeps the DAG online.”

Some similar information can also be found at:

https://practical365.com/exchange-server/windows-server-2012-dynamic-quorum/
http://techgenix.com/exchange-2013-dag-dynamic-quorum-part1/ for example.

 

Well, this would all be too good to be true if it wasn’t for the “the cluster must shut down cleanly” -part. This got me thinking about alternatives. What about a third Exchange server and skipping the witness server altogether? Well, it doesn’t work any better as stated above. It’s the same dilemma if two of the nodes looses power. The solution as I can see it is (briefly) explained in the below article, DAC – Database Activation Coordination mode. This, together with an alternative witness server is the recipe for a better disaster plan. With DAC and an alternative witness server in place, you can force the exchange servers in a AD-Site to connect to the local witness server. It requires some manual work (in case disaster strikes) though, but it’s doable.

 

DAC

So, what’s up with the DAC mode and the alternative witness server? Lets have a look. First, let’s do some homework and have a look at DAC:

https://technet.microsoft.com/en-us/library/dd979790.aspx
https://practical365.com/exchange-server/exchange-best-practices-datacenter-activation-coordination-mode/
https://blogs.technet.microsoft.com/exchange/2011/05/31/exchange-2010-high-availability-misconceptions-addressed/

DAC mode is used to control the database mount on startup behavior of a DAG. This control is designed to prevent split brain from occurring at the database level during a datacenter switchback. Split brain, also known as split brain syndrome, is a condition that results in a database being mounted as an active copy on two members of the same DAG that are unable to communicate with one another. Split brain is prevented using DAC mode, because DAC mode requires DAG members to obtain permission to mount databases before they can be mounted”.

Source: https://technet.microsoft.com/en-us/library/dd979790.aspx

Datacenter Activation Coordination (DAC) mode has nothing whatsoever to do with failover. DAC mode is a property of the DAG that, when enabled, forces starting DAG members to acquire permission from other DAG members in order to mount mailbox databases. DAC mode was created to handle the following basic scenario:

  • You have a DAG extended to two datacenters.
  • You lose the power to your primary datacenter, which also takes out WAN connectivity between your primary and secondary datacenters.
  • Because primary datacenter power will be down for a while, you decide to activate your secondary datacenter and you perform a datacenter switchover.
  • Eventually, power is restored to your primary datacenter, but WAN connectivity between the two datacenters is not yet functional.
  • The DAG members starting up in the primary datacenter cannot communicate with any of the running DAG members in the secondary datacenter”.

Source: https://blogs.technet.microsoft.com/exchange/2011/05/31/exchange-2010-high-availability-misconceptions-addressed/

In short: Enable DAC mode on your Exchange servers if using more than two nodes.

 

Alternative witness server

Now that we have some basic understanding about DAC, let’s look at the Alternative witness server (AWS):

https://www.rutter-net.com/blog/news/alternate-file-share-witness-correcting-the-confusion
https://blogs.technet.microsoft.com/exchange/2011/05/31/exchange-2010-high-availability-misconceptions-addressed/

I think it’s quite well summarized in the first article:

The confusion lies in the event of datacenter activation; that the alternate file share witness would automatically come online as a means to provide quorum to the surviving DAG members and keep the databases mounted. So in many ways, some people view it as redundancy to the file share witness for an even numbered DAG.

In reality, the alternate file share witness is only invoked when an admin goes through procedures of activating the mailbox servers who lost quorum. DAC mode dramatically simplifies the process and when the “Restore-DatabaseAvailabilityGroup” cmdlet is executed during a datacenter activation, the alternate file share witness will be activated.”

The second article also has some nice overall information about High Availability Misconceptions. I suggest you read it.

In short: Manual labor is required even though you have configured an alternative witness server.

 

Datacenter switchover

So, what to do when disaster strikes? First, have a look at the TechNet article “Datacenter switchovers”:

https://technet.microsoft.com/en-us/library/dd351049.aspx

Then have a look at:

https://smtpport25.wordpress.com/2010/12/10/exchange-2010-dag-local-and-site-drfailover-and-fail-back/ for some serious deep diving into the subject. This has to be one of the most comprehensive articles about DAG/Failover/DAC/you name it on the Internet.

I’ll summarize the TechNet and the smtpport25 articles into actions:

From TechNet:

“There are four basic steps that you complete to perform a datacenter switchover, after making the initial decision to activate the second datacenter:

  1. Terminate a partially running datacenter   This step involves terminating Exchange services in the primary datacenter, if any services are still running. This is particularly important for the Mailbox server role because it uses an active/passive high availability model. If services in a partially failed datacenter aren’t stopped, it’s possible for problems from the partially failed datacenter to negatively affect the services during a switchover back to the primary datacenter”.

The sub-chapter Terminating a Partially Failed Datacenter has details on how to do this, and smtpport25 has even more information. If you start reading from “Figure 19” onwards in the smtpport25 article you’ll find this:

In figure 20. Marked in red has the details about started mailbox servers and Stopped Mailbox Servers. Started mailbox servers are the servers which are available for DAG for bringing the Database online. Stopped mailbox Servers are no longer participating in the DAG. There may be servers which are offline or down because of Datacenter failures. When we are restoring the service on secondary site, ideally all the servers which are in primary should be marked as stopped and they should not use when the services are brought online”.

So, in other words we should move the primary servers into Stopped State. To do that, use the PowerShell command:

Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver AMBX1 –Configurationonly

Stop-DatabaseAvailabilityGroup -Identity DAG1 -Mailboxserver AMBX2 –Configurationonly

Source: https://smtpport25.wordpress.com/2010/12/10/exchange-2010-dag-local-and-site-drfailover-and-fail-back/

Then, TechNet and smtpport25 have different information:

TechNet tells you to:

“2.The second datacenter must now be updated to represent which primary datacenter servers are stopped. This is done by running the same Stop-DatabaseAvailabilityGroup command with the ConfigurationOnly parameter using the same ActiveDirectorySite parameter and specifying the name of the Active Directory site in the failed primary datacenter. The purpose of this step is to inform the servers in the second datacenter about which mailbox servers are available to use when restoring service”.

The above should be enough if the DAG is in DAC mode (which it is).

Smtpport25 however doesn’t mention DAC mode at all in this case, instead they use the non-DAC mode approach from TechNet, with a little twist:

  • First, stop the cluster service on the secondary site/datacenter, Net stop Clussvc
  • Then, restore DAG on the secondary site, Restore-DatabaseAvailabilityGroup -Identity DAG01 -ActiveDirectorySite BSite

I honestly don’t know which of the solutions are correct, and I hope I won’t have to find out in our production environment anytime soon 🙂

 

Next step would be to Activate the Mailboxes Servers, again following different information whether the DAG is in DAC mode or not. I won’t paste all the text here as it is available in the TechNet article.

Then, following on to the chapter Activating Client Access Services:

  • Activate Client Access services   This involves using the URL mapping information and the Domain Name System (DNS) change methodology to perform all required DNS updates. The mapping information describes what DNS changes to perform. The amount of time required to complete the update depends on the methodology used and the Time to Live (TTL) settings on the DNS record (and whether the deployment’s infrastructure honors the TTL).

We do not need to perform this step as we’re using Zen Load Balancer 🙂

And lastly, I won’t copy/paste information regarding Restoring Service to the Primary Datacenter, it’s already nicely written in the TechNet or smtpport25 article. I sure do hope I won’t have to use the commands though 🙂

Advertisements

Health Checking / Monitoring Exchange Server 2013/2016

I‘ve never wrote about monitoring / health checking before so here we go. There are maaaaany different ways of monitoring servers, so I’ll just present my way of monitoring things (in the Exchange environment). If you’re using SCOM or Nagios, you’re already halfway there. The basic checks in SCOM or Nagios will warn you about low disk space and high CPU load and so forth. But what if a Exchange service or a DAG renders errors for example? Exchange is a massive beast to master, so in the end you’ll need decent monitoring tools to make your life easier (before disaster strikes).

We’re using Nagios for basic monitoring which is working great. That said, from time to time I’ve noticed some small problems that Nagios won’t report. I have then resorted to PowerShell commands or the windows event log. These problems would probably have been noticed (in time) if we had decent/additional Exchange-specific monitoring in place. There are Exchange-plugins available for Nagios (I’ve tried a few), but they aren’t as sophisticated as custom PowerShell scripts made by “Exchange experts”. It’s also much easier running a script from Task Scheduler than configuring Nagios. At least that’s my opinion.

Anyhow, our monitoring/health checking consist of three scripts, namely:

add-pssnapin *exchange* -erroraction SilentlyContinue
$body=Get-HealthReport -Server “yourserver” | where {$_.alertvalue -ne “Healthy” -and $_.AlertValue -ne “Disabled”} | Format-Table -Wrap -AutoSize; Send-MailMessage -To “me@ourdomain.com” -From “HealthSetReport@yourserver” -Subject “HealthSetReport, yourserver” -Body ($body | out-string ) -SmtpServer yoursmtpserver.domain.com

 

I have all of these scripts set up as scheduled tasks. You’d think that setting up a scheduled task is easy. Well, not in my opinion. I had to try many different techniques but at least it’s working now.

For Paul’s script I’m using the following settings:

ExServerHealth_schedtask_general

  • For “Triggers” I’m using 02:00 daily.
  • For “Actions” I’m using:
    • Start a program: powershell.exe
    • Add arguments (optional): -NoProfile -ExecutionPolicy Bypass -File “G:\software\scripts\Test-ExchangeServerHealth.ps1” –SendEmail
    • Start in (optional): G:\software\scripts

 

The same method wouldn’t work for Steve’s script though. I used the same “Run with highest privileges” setting, but running the PowerShell command similar to the above wouldn’t work. (This was easily tested running from a cmd promt manually (instead of powershell)). My solution:

  • Triggers: 01:00 every Saturday (yeah, I don’t feel the need to run this every night. Paul’s script will report the most important things anyways).
  • Actions:
    • Start a program: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
    • Add arguments (optional): -NonInteractive -WindowStyle Hidden -command “. ‘C:\Program Files\Microsoft\Exchange Server\V15\bin\RemoteExchange.ps1’; Connect-ExchangeServer -auto; G:\software\scripts\Get-ExchangeEnvironmentReport.ps1 -HTMLReport G:\software\scripts\environreport.html -SendMail:$true -MailFrom:environreport@yourserver -MailTo:me@ourdomain.com -MailServer:yoursmtpserver.domain.com
    • Start in (optional): (empty)

 

My own script:

  • Triggers: 00:00 every Sunday (yeah, I don’t feel the need to run this every night. Paul’s script will report the most important things anyways).
  • Actions:
    • Start a program: powershell.exe
    • Add arguments (optional): -NoProfile -ExecutionPolicy Bypass -File “G:\software\scripts\Get-HealthSetReport.ps1”
    • Start in (optional): G:\software\scripts

 

Sources:

http://practical365.com/exchange-server/powershell-script-exchange-server-health-check-report/ (Paul’s script)
http://www.stevieg.org/2011/06/exchange-environment-report/ (Steve’s script)

https://technet.microsoft.com/en-us/library/jj218724%28v=exchg.160%29.aspx
http://www.msexchange.org/kbase/ExchangeServerTips/ExchangeServer2013/Powershell/scheduling-exchange-powershell-task.html
http://blog.enowsoftware.com/solutions-engine/bid/186014/Introduction-to-Managed-Availability-How-to-Check-Recover-and-Maintain-Your-Exchange-Organization-Part-II

 

Exchange also checks its own health. Let me copy/paste some information:

One of the interesting features of Exchange Server 2013 is the way that Managed Availability communicates the health of individual Client Access protocols (eg OWA, ActiveSync, EWS) by rendering a healthcheck.htm file in each CAS virtual directory. When the protocol is healthy you can see it yourself by navigating to a URL such as https://mail.exchangeserverpro.net/owa/healthcheck.htm.
When the protocol is unhealthy the page is unavailable, and instead of the HTTP 200 result above you will see a “Page not found” or HTTP 404 result instead.

Source: http://practical365.com/exchange-server/testing-exchange-server-2013-client-access-server-health-with-powershell/

Further reading: https://blogs.technet.microsoft.com/exchange/2014/03/05/load-balancing-in-exchange-2013/ and the chapter about Health Probe Checking

We have no need to implement this at the “Exchange server-level” though, as these checks are already done in our Zen Load Balancer (described in this blog post). I guess you could call this the “fourth script” for checking/monitoring server health.

 

Reports

So, what all this adds up to are some nice reports. I’ll get a daily mail (generated at 02:00) looking like this (and hopefully always will 🙂 ):

ExServerHealth_screenshot_from_report

Daily report generated from Test-ExchangeServerHealth.ps1 script

 

ExEnvironmentreport_screenshot_from_report

Weekly report generated from Get-ExchangeEnvironmentReport.ps1 script

 

ExServerHealthSet_screenshot_from_report

Weekly report generated from Get-HealthSetReport.ps1 script (The problem was already taken care of 🙂 )

 

As you can see from the screenshots above, these checks are all reporting different aspects of the Exchange server health.

This quite much covers all the necessary monitoring in my opinion and with these checks in place you can probably also sleep better during the nights 🙂 Even though these checks are comprehensive, I’m still planning on even more checks. My next step will be an attempt at Real-time event log monitoring using NSClient++ / Nagios. Actually the Nagios thought was buried and I will instead focus on the ELK stack which is a VERY comprehensive logging solution.

Using FarmGuardian to enable HA on Back-ends in Zen Load Balancer

We’ve been using the Zen Load Balancer Community Edition in production for almost a year now and it has been working great. I previously wrote a blog post about installing and configuring Zen, and now it was time to look at the HA aspect of the back-end servers defined in various Zen farms. Zen itself is quite easy to set up in HA-mode. You just configure two separate Zen servers in HA-mode according to Zen’s own documentation. Well, this is very nice and all, and it’s also working as it should. The thing that confused me the most however (until now), is the HA aspect of the back-ends. I somehow thought that If you specify two back-ends in Zen and one of them fail, Zen automatically uses the backend which is working and marked as green (status dot). Well, this isn’t the case. I don’t know if I should blame myself or the poor documentation – or both. Anyways, an example is probably better. Here’s an example of L4xNAT-farms for Exchange (with two back-ends):

zen_farm_table2017

I guess it’s quite self-explanatory; we’re Load Balancing the “normal” port 443 + imap and smtp. (All the smtp-ports aren’t open to the Internet though, just against our 3rd party smtp server). The http-farm is used for http to https redirection for OWA.

Furthermore, expanding the Exchange-OWAandAutodiscover-farm:

zen_owa_and_autodiscover_farm2017

 

and the monitoring part of the same farm:

zen_owa_and_autodiscover_farm_monitoring2017

 

This clearly shows that the “Load Balancing-part” of Zen is working – the load is evenly distributed. You can also see that the status is green on both back-ends. Fine. Now one would THINK that the status turns RED if a back-end is down and that all traffic would flow through the other server if this happens. Nope. Not happening. I was living in this illusion though 😦 As I said before, this is probably a combination of my own lack of knowledge and poor documentation. Also, afaik there are no clear “rules” for the farm type you should use when building farms. Zen itself (documentation) seem to like l4xnat for almost “everything”. However, if you’re using HTTP-farms, you get HA on the back-ends out-of-the box. (You can specify back-end response timeouts and checks for resurrected back-ends for example). Then again, you’ll also have to use SSL-offloading with the http-farm which is a whole different chapter/challenge when used with Exchange. If you’re using l4xnat you will NOT have HA enabled on the back-ends out-of-the-box and you’ll have to use FarmGuardian instead. Yet another not-so-well-documented feature of Zen.

FarmGuardian “documentation” is available at https://www.zenloadbalancer.com/farmguardian-quick-start/. Have a look for yourself and tell me if it’s obvious how to use FarmGuardian after reading.

Luckily I found a few hits on Google (not that many) that were trying to achieve something similar:

https://sourceforge.net/p/zenloadbalancer/mailman/message/29228868/
https://sourceforge.net/p/zenloadbalancer/mailman/message/32339595/
https://sourceforge.net/p/zenloadbalancer/mailman/message/27781778/
https://sourceforge.net/p/zenloadbalancer/mailman/zenloadbalancer-support/thread/BLU164-W39A7180399A764E10E6183C7280@phx.gbl/

These gave me some ideas. Well, I’ll spare you the pain of googling and instead I’ll present our (working) solution:

zen_owa_and_autodiscover_farm_with_farmguardian_enabled2017

First off, you’ll NEED a working script or command for the check-part. Our solution is actually a script that checks that every virtual directory is up and running on each exchange back-end. If NOT, the “broken” back-end will be put in down-mode and all traffic will instead flow through the other (working) one. I chose 60 sec for the check time, as Outlook times out after one minute by default (if a connection to the exchange server can’t be established). Here’s the script, which is based on a script found at https://gist.github.com/phunehehe/5564090:

zen_farmguardian_script2017

Big thanks to the original script writer and to my workmate which helped me modify the script. Sorry, only available in “screenshot form”.

You can manually test the script by running ./check_multi_utl.sh “yourexchangeserverIP”  from a Zen terminal:

zen_farmguardian_script_manual_testing_from_terminal2017

The (default) scripts in Zen are located in /usr/local/zenloadbalancer/app/libexec btw. This is a good place to stash your own scripts also.

 

You can find the logs in /usr/local/zenloadbalancer/logs. Here’s a screenshot from our log (with everything working):

zen_farmguardian_log2017

 

And lastly I’ll present a couple of screenshots illustrating how it looks when something is NOT OK:

(These screenshots are from my own virtual test environment, I don’t like taking down production servers just for fun 🙂 )

zen_owa_and_autodiscover_farm_monitoring_host_down2017

FarmGuardian will react and present a red status-symbol. In this test, I took down the owa virtual directory on ex2. When the problem is fixed, status will return to normal (green dot).

 

and in the log:

zen_farmguardian_log_when_failing2017

The log will tell you that the host is down.

 

Oh, as a bonus for those of you wondering how to do a http to https redirect in Zen:

zen_http_to_https_redirect2017

Create new HTTP-farm and leave everything as default. Add a new service (name it whatever you want) and then just add the rules for redirection. Yes, it’s actually this simple. At least after you find the documentation 🙂

And there you have it. Both the Zen servers AND the back-ends working in HA-mode. Yay 🙂

Exchange 2007 to 2013 migration

This time around I’ll write about all the steps I used for a successful Exchange 2007 to 2013 migration in a small size company (about 40 users). I’m not quite sure (yet) if we will do the upgrade to 2016 directly afterwards, however I’ll make a new 2013 to 2016 migration blog post if we decide to do so. At this point I can also state that it is impossible/not supported to do a migration from Exchange 2007 to Exchange 2016 directly, at least without 3rd party software. You can have a look at the “migration chart” at https://technet.microsoft.com/en-us/library/ms.exch.setupreadiness.e16e12coexistenceminversionrequirement%28v=exchg.160%29.aspx for example. If you want to pay extra money and go directly from 2007 to 2016 it should be possible with CodeTwo Exchange Migration, http://www.codetwo.com/blog/migrate-legacy-2003-or-2007-exchange-to-exchange-2016/ for example.

Anyways, onto the migration stuff itself. As always you should start with the homework. This time I don’t have that many sources for you – instead some quality ones which gets the job done properly. Start off by reading:

The second link is awesome! Read it slowly and carefully. You’ll be a lot smarter in the end. Lots of stuff to think of, but very nicely written.

I didn’t follow the guide exactly to the word (as usual), but I couldn’t have done the job without it. Some changes for our environment:

  • We do not use TMG. All TMG steps were bypassed and were replaced by similar steps according to our own firewall policies.
  • We have a Linux postfix server that handles all incoming email. It also handles antivirus and spam checking of emails. After these checks are done, it forwards email to the Exchange server.
  • Storage configuration / Hard drive partitions / Databases weren’t created the same way as in the guide.
  • Certificates were renewed by our “certificate guy”. No need for complicated requests etc.
  • No stress tests and/or analyses were done. No need.
  • Configured recipient filtering (there’s a chapter about it).
  • A script which deletes old IIS and Exchange logs was introduced (there’s a part written about this also).

 

My own steps for the migration process:

On the old server:

  • Patched Exchange Server 2007. Also installed the latest Update Rollup (21). You should have a fully patched (old)server before installing/introducing a new Exchange Server in the domain.

          exchange2007_Update_rollup21

  • Took screenshots of all current configurations (just in case). Most of the settings will migrate however. Stuff to backup are nicely documented in the above homework-link.
    • Namespace
    • Receive connectors
    • Send connectors
    • Quotas
    • Outlook Anywhere, OWA, OAB, EWS, ActiveSync settings
    • Accepted domains
    • Etc.. etc. that would be of use
  • Got a new certificate which included the new host legacy.domain.com
  • Installed the new certificate (on Exchange 2007 at first):

          exchange2007_install_new_cert

 

On the new server:

  • Installed a new server, Windows Server 2012 R2.

                   exchange2013_rsat-adds-installation

      • Moving on to the other prerequisites:

                   exchange2013_prerequisites

 

Moving on to the actual Exchange installation

  • Had a look at my partition table, just to check that everything looked OK. (it did):

          exchange2013_partitions

  • The partition layout should be quite self-explanatory so I won’t comment on that. I will however tell setup to use the existing partitions. I actually resized the partitions a bit after this screenshot…
  • Once again following information from the excellent guide, I used the latest CU as installation source (NOT the installation DVD/ISO).

               exchange2013_prepare_schema

               exchange2013_prepare_AD_and_domain

 

  • Actual installation (note paths for DB and Logs):

          exchange2013_installation_from_powershell

  • Done. Moving over to post-installation steps

 

Post-installation steps

  • Checking and changing the SCP. This should be done asap after the installation.

          exchange_checking_scp

          Checking SCP.

          exchange_changing_scp

          Changing SCP.

  • Everything looks good!
  • Next, we’ll install the new certificate on the Exchange 2013 server:

          exchange2013_install_new_cert

           A simple “import” will do the job.

  • Also have a look at the certificate in IIS (and change to the new one if necessary):

          exchange2013_install_new_cert_in_IIS

           exchange2013_outlook_anywhere

Following the guide you should change the authentication to NTLM:

“As Outlook Anywhere is the protocol Outlook clients will use to communicate with Exchange Server 2013, replacing MAPI/RPC within the LAN, it’s important that these settings are correct – even if you are not publishing Outlook Anywhere externally. During co-existence it’s also important to ensure that the default Authentication Method, Negotiate, is updated to NTLM to ensure client compatibility when Exchange 2013 proxies Outlook Anywhere connections to the Exchange 2007 server”.

  • Moving over to the send and receive connectors.
    • The send connector automatically “migrated” from the old server.
    • The receive connector did NOT migrate from the old server. This is because Exchange 2013 use different roles for transportation compared to 2007. 2007 included only Hub Transport, but Exchange 2013 use both Hub Transport and Frontend Transport. For those of you interested in this change, read http://exchangeserverpro.com/exchange-2013-mail-flow/ and http://exchangeserverpro.com/exchange-2013-configure-smtp-relay-connector/ for example.
    • The CAS receives mail on port 25 and forwards it to the “backend” mailboxes that listens on port 2525.
    • I left the “Default Frontend servername” with its default settings:

               exchange2013_default_frontend_recieve_connector

    • …and configured a new SMTP relay-connector which has “our settings”. This connector has to be “Frontend Transport”. You cannot create a new connector as Hub Transport. You’ll be greeted by an error message if you try:

              exchange2013_recieve_connector_error

Information about this can be found at:

http://markgossa.blogspot.fi/2016/01/bindings-and-remoteipranges-parameters-conflict-exchange-2013-2016.html
http://exchangeserverpro.com/exchange-server-2013-upgrade-fails-due-to-receive-connector-conflicts/

If you want to create a new receive connector that listen on port 25, you can do this but you have to create it using the Frontend Transport role if you have either an Exchange 2016 server or an Exchange 2013 server with both the CAS and MBX roles installed on the same server”.

All our University email (and this specific company’s email) is received via a Linux postfix server. This server handles all spam filtering and antivirus. After these checks are done, the mail is delivered/forwarded to Exchange.

exchange2013_aasmtp_relay_security

exchange2013_aasmtp_relay_scoping

 

After these steps were done, I continued with:

  • Configuring mailbox quotas to match those on the old server.
  • Configuring the Offline Address Book to be stored on the new server.
  • Checking the log locations – should the transport logs be moved to another location or left at the default location? I changed them so they will go to the log-partition. In the end, this is just a small percentage of all logs generated. All other non-transport logs gets filled under C:\Program Files\Microsoft\Exchange Server\V15\Logging. I’m using a PowerShell script to delete all logs older than 30 days, and the same goes for the IIS logs in C:\inetpub\logs. The script looks like this, and is run via Task Scheduler daily:

$DateToDelete = 30
$StartFolder = “C:\Program Files\Microsoft\Exchange Server\V15\Logging”
$Year = (Get-Date).Year
$Day = Get-Date
Get-ChildItem $StartFolder -Recurse -Force -ea 0 | where{!$_.PsIsContainer -and $_.LastWriteTime -lt (Get-Date).AddDays(-$DateToDelete)} | ForEach{Add-Content -Path “Delete Log $Year.log” -Value ” $_.FullName”; Remove-Item -Path $_.FullName }
$DateToDelete = 30
$StartFolder = “e:\Logs”
$Year = (Get-Date).Year
$Day = Get-Date
Get-ChildItem $StartFolder -Recurse -Force -ea 0 | where{!$_.PsIsContainer -and $_.LastWriteTime -lt (Get-Date).AddDays(-$DateToDelete)} | ForEach{Add-Content -Path “Delete Log $Year.log” -Value ” $_.FullName”; Remove-Item -Path $_.FullName }
$DateToDelete = 30
$StartFolder = “c:\inetpub\logs”
$Year = (Get-Date).Year
$Day = Get-Date
Get-ChildItem $StartFolder -Recurse -Force -ea 0 | where{!$_.PsIsContainer -and $_.LastWriteTime -lt (Get-Date).AddDays(-$DateToDelete)} | ForEach{Add-Content -Path “Delete Log $Year.log” -Value ” $_.FullName”; Remove-Item -Path $_.FullName }
exit

And the command to run from task scheduler:

  • PowerShell.exe -NoProfile -ExecutionPolicy Bypass -Command “& ‘D:\pathtoyour\scripts\clearlogging.ps1′”
    • Runs daily at 03:00

As you’ve probably noticed from my Exchange installation screenshots, I already pointed the Transaction logs to a different partition in the installation phase (E:\Databases\DB1). These logs don’t need manual deletion however, they get deleted via the backup solution automatically (Veeam). The key here is that the backup software has to be Exchange aware. The other logs at e:\ are the Transport logs (E:\Logs), which are only a tiny part of the whole logging structure (C:\Program Files\Microsoft\Exchange Server\V15\Logging) in Exchange. You could leave the Transport logs in their default location though, as the above script will go through that directory also…

 

Recipient filtering / Stopping backscatter

As a nice bonus, Exchange 2013 can now handle recipient filtering (filter out non-existent users) properly. For more information about recipient filtering read:

https://technet.microsoft.com/en-us/library/bb125187%28v=exchg.160%29.aspx
http://exchange.sembee.info/2013/mbx/filter-unknown.asp
https://www.roaringpenguin.com/recipient-verification-exchange-2013

The filtering CAN be done without an Exchange Edge server even though Internet will tell you otherwise. We enabled it on our postfix server following tips found on https://www.roaringpenguin.com/recipient-verification-exchange-2013. Installation on the Exchange-side on the other hand looked like this:

exchange2013_recipient_filtering1
exchange2013_recipient_filtering2

exchange2013_recipient_filtering3

I also enabled Anonymous users on the “Default receive connector”:

exchange2013_default_recieve_connector

Happy days! We can now filter out non-existent users on Exchange rather than manually on the postfix server.

I also checked that recipient filtering was active and working:

exchange2013_recipient_filtering_test_telnet

Yes, it was 🙂

With all this done I now moved forward with the configuration. Again, following http://www.msexchange.org/articles-tutorials/exchange-server-2013/migration-deployment/planning-and-migrating-small-organization-exchange-2007-2013-part13.html

 

Getting ready for coexistence

I’ll start off by copy/pasting some text.

“With our database settings in place and ready to go, we can start thinking about co-existence – before we do though, it’s time to make sure things work within Exchange 2013! So far we’ve got our new server up and running, but we’ve still not logged in and checked everything works as expected”. Source: http://www.msexchange.org/articles-tutorials/exchange-server-2013/migration-deployment/planning-and-migrating-small-organization-exchange-2007-2013-part13.html

With this information in mind, I started testing according to the above link. The chapter of interest was “Testing base functionality”. All tests passed. Very nice 🙂

With all tests done, and all users aware of the migration, I did the following after work hours:

    • Asked the “DNS guy” to make a CNAME record for legacy.domain.com pointing to the old server.
    • Changed all virtual directories on the old server to use the name “legacy”.
      • Things to remember:
        • No external url for Microsoft-Server-ActiveSync.
        • Autodiscover Internal URL / SCP record on both Exchange 2007 and Exchange 2013 server should point to the new server.
    • DNS records are changed to point to the new server.
      • autodiscover and the namespace –record
    • Had a look at the send connector. Everything seemed OK. (Settings were migrated from the old server). However, minor change:
      • Removed the old server from the “source servers” and added the new server. New mail should be sent from the new server (and not from the old one anymore):

               exchange2013_send_connector

    • postfix was also configured to route mail to the new server instead of the old one.
    • Done. Next in line is moving/migrating mailboxes to the new server. Yay.

 

Migrating mailboxes

I started out by following the guide at http://www.msexchange.org/articles-tutorials/exchange-server-2013/migration-deployment/planning-and-migrating-small-organization-exchange-2007-2013-part15.html , more specifically the part about “Pre-Migration Test Migrations”. I moved a couple of test users and after that I sent and received mail to/from these users via Outlook and OWA. No errors were noticed, so I moved over to the real deal and started moving “real” mailboxes. Again, nothing special, I continued following the information at http://www.msexchange.org/articles-tutorials/exchange-server-2013/migration-deployment/planning-and-migrating-small-organization-exchange-2007-2013-part16.html. I did a batch of 10 users at first (users A to E) and all of them were successfully migrated:

exchange2013_migrate_users_a_to_e

(The remaining mailboxes were also successfully migrated).

 

Upgrading Exchange AD Objects

Now it was time to upgrade the AD Objects following information from http://www.msexchange.org/articles-tutorials/exchange-server-2013/migration-deployment/planning-and-migrating-small-organization-exchange-2007-2013-part16.html.

exchange2013_ad_object_upgrade1

exchange2013_ad_object_upgrade2

The first two objects didn’t need an upgrade, they were apparently already automatically upgraded during the migration process. The distribution group in the screenshot that needed an upgrade is a mailing list/distribution group.

 

Public Folders

The old environment didn’t use public folders so luckily there were no need to migrate these. I did run into some problems with Public Folders however. More information in the chapter below.

 

Problems

  • Everything seemed fine, BUT after a couple of days one user didn’t see any new mail in a delegated mailbox she had. She also got the dreaded password prompt every time she started Outlook.
    • Later I heard that also other users were prompted for password
  • This got me thinking about authentication methods. I’ve seen this before. A couple hours of googling still had my thoughts in the same direction, authentication methods.
  • I still wonder why all of this happened though, knowing that ALL mailboxes were now hosted on the new Exchange 2013 server. Why on earth would someone’s Outlook even check for things on the old server? Maybe some old Public Folder references etc. perhaps? Don’t know, the only thing I do know is that it had to be fixed.

Some links about the same dilemma (almost, at least):

http://ilantz.com/2013/06/29/exchange-2013-outlook-anywhere-considerations/
https://gonjer.com/2016/07/02/outlook-prompts-for-credentials-with-exchange-2010-and-20132016-coexistence/
http://blogs.microsoft.co.il/yuval14/2014/08/09/the-ultimate-guide-exchange-2013-and-outlook-password-prompt-mystery/ (L. Authentication Issue)
http://silbers.net/blog/2014/01/22/exchange-20072013-coexistence-urls/

The thing is, I had authentication set to “NTLM” on the new Exchange 2013 server during the coexistence, following the very same guide as with almost everything else in this post. The NTLM setting should be “enough” afaik. One thing that wasn’t mentioned in the guide however, was how the old server was/should be configured. I’m quite sure there are many best practices for Exchange 2007 also, but I myself hadn’t installed that server in the past. Well, hours later, comparing different authentication methods, I finally think I got it right. Here’s the before and after:

exchange2013_get-outlookanywhere_auth_methods

Before: old server IISAuthenticationMethods were only Basic.

exchange2013_set-outlookanywhere_iis_auth_methods

Solution: Adding NTLM to IISAuthenticationMethods (on the legacy server)

exchange2013_get-outlookanywhere_auth_methods_after_change

After: NTLM added

I also removed the “Allow SSL offloading” from the new server for consistency. Not that I know if it helped fixing the problem or not.

exchange2013_remove_ssl_offloading

You get kinda tired from all testing and googling, but hey, at least its working as it should and users aren’t complaining anymore! 🙂

 

  • Shared mailbox dilemma. When you send a message from the shared mailbox, the sent message goes into your own Sent Items folder instead of the shared mailbox sent items.

If the shared mailbox is on Exchange 2010 and only has the Exchange 2010 sent items behavior configured, the settings are not converted to the equivalent Exchange 2013 settings during migration. You will need to manually apply the Exchange 2013 sent items configuration. It is probably best to do that before moving the mailbox. The Exchange 2010 settings are retained though”. Source: http://exchangeserverpro.com/managing-shared-mailbox-sent-items-behaviour-in-exchange-server-2013-and-office-365/

    • Well, no wonder my settings didn’t stick when migrating from 2007 to 2013. I configured the correct settings again:
      • Get-Mailbox mysharedmailbox | Set-Mailbox -MessageCopyForSentAsEnabled $true -MessageCopyForSendOnBehalfEnabled $true

 

Decommission Exchange 2007

Still following the guide, it was now time to decommission the old Exchange 2007 server. First off I left the server turned OFF for a week. No problems were encountered, so I decided to move on with the real decommissioning work.

  • Didn’t need to touch any TMG rules (obviously, since we don’t use TMG)
  • Removed unused Offline Address Books (OAB)
  • Removed old Databases
    • Mailbox Database removal was OK.
    • Public Folders were a whole different story. What a headache. I followed almost every guide/instruction out there. Did NOT WORK. I got the “nice” message: “The public folder database “ExchangeServer\Storage Group\Public Folder Database” contains folder replicas. Before deleting the public folder database, remove the folders or move the replica to another public folder database”. God dammit. We’ve never ever been using Public Folders. Well, luckily I found some useful “fixes” after a while. Some “fixes” that MS won’t mention. Solutions:
    • Removed CN=Configuration,CN=Services, CN=Microsoft Exchange, CN={organisation name i.e First Organisation}, CN=Administrative Groups, CN={Administrative Group name}, CN=Servers, CN={servername}, CN=Information Store, CN={Storage Group Name}, CN={Public Folder Database Name} with ADSIEdit (after I had backed up the key with help from http://www.mysysadmintips.com/windows/active-directory/266-export-active-directory-objects-with-ldifde-before-performing-changes-with-adsi-edit for example).
    • Ran the Get-MailboxDatabase | fl name,pub* –command again, but to my surprise the damn Public Folder Database wasn’t gone. Instead it was in the AD “Deleted Objects”. FFS, it CAN’T be this hard removing the PF Database (reference).
    • Trying to get rid of the deleted object with ldp didn’t work either: “The specified object does not exist”. I was getting even more frustrated.
    • Well, at least now according to EMC I have no active Mailbox Databases. That’s good news, so I can now remove the Storage Groups even though this annoying PF DB reference still exist in AD. I can live with it for now, and hopefully when the Tombstone Lifetime expires, so will this PF DB reference. (That wasn’t the case however, continue reading)
  • Removed Storage Groups, FINALLY:

           exchange2007_storage_group_removal_success

                     exchange2013_arbitration_and_system_mailbox_check

      • System mailboxes are already on the new server. Good.
  • Uninstalled Exchange 2007 from control panel.
    • At least I tried. Of course there were problems. Again.

            exchange2007_uninstall_failiure

Got some tips from https://social.technet.microsoft.com/Forums/exchange/en-US/6469264a-dc33-4b07-8a7c-e681a0f9248f/exchange-setup-error-there-was-a-problem-accessing-the-registry-on-this-computer?forum=exchangesvradminlegacy. Solution was simply to start the Remote Registry service. It now uninstalled nicely.

          exchange2013_get-mailboxdatabase-with-pf

  • Removed legacy DNS entries
  • Firewall guy was informed that the server was decommissioned and all its firewall rules could be removed.
  • Turned off the server and archived it.
  • Happy days. No more Exchange 2007.

 

Security hardening

I always aim to keep my servers secure. This one was no exception, so I was aiming for at least a grade A on the Qualys SSL Labs test, https://www.ssllabs.com/ssltest/. I followed the guide from https://scotthelme.co.uk/getting-an-a-on-the-qualys-ssl-test-windows-edition/ and voilà, grade A was achieved  🙂 I left the HTTP Strict Transport Security policy alone for now however, it will need some more testing.

exchange2013_qualys_ssl_labs_test_grade_A

Exchange Server Connector (for SCCM)

I was “given” the task of finding an easy way for the IT supporters to check whether or not a user has configured his/her mobile phone (Nokia Lumia) against our Exchange server. We’re checking this mostly because the user agreement states that every user should have an Exchange account configured. With an Exchange account configured, it’s possible (for the Exchange/SCCM Admins) to remotely wipe the phone (among other things).

The Exchange Server Connector is by no means a full blown MDM solution (for SCCM), but it can handle the basic tasks. If you want a solution with all the bells and whistles, have a look at Microsoft Intune instead. On the positive side, Exchange Server Connector is free and Intune is not. Some differences between the MDM solutions can be found here for example:

http://myitforum.com/myitforumwp/2013/05/14/three-options-for-managing-mobile-devices-using-sccm-2012-without-windows-intune/
https://technet.microsoft.com/en-us/library/gg682022.aspx
http://configmgrblog.com/2011/02/09/cep-meeting-9-summary-sccm-2012-mobile-device-management/

The above links include tables which will help you decide what mobile device management methods support the mobile device platforms you have in your environment. They can also help you decide between in Depth vs. Light Management and so on. All in all the links gives you an idea of what you can and cannot do with the Exchange Connector.

The short version is that SCCM 2012 (R2) is out-dated in terms of MDM management. You only have support for limited devices by default, check: https://technet.microsoft.com/en-us/library/gg682077.aspx#BKMK_SupConfigMobileClientReq (Mobile Devices Enrolled by Configuration Manager and Mobile Device Legacy Client). By adding the Exchange Server Connector you’ll get support for more devices (all Exchange Active Sync devices), but the configuration on these devices is limited to the same things that can be configured on the Exchange Server (“light management”). The settings are listed in the table “Choose a mobile device management solution based on management functionality” from the page https://technet.microsoft.com/en-us/library/gg682022.aspx . As you can see, you can’t install software or make a software inventory but things like Remote wipe and settings management are possible. I’ll attach a screenshot of the things you can configure:

exchange_mobile_device_access

Fig 1. Mobile device access (EAS settings)

exchange_mobile_device_mailbox_policies

Fig 2. Mobile device mailbox policies

These same settings apply to SCCM once you have the connector set up correctly. That said, let’s set it up!

 

Installation

First some reading for you all:

http://blogs.technet.com/b/system_center_in_action/archive/2011/09/02/configuration-manager-2012-exchange-connector-implementation-in-microsoft-it.aspx
http://configmgrblog.com/2011/09/16/exchange-connector-in-configuration-manager-2012-revealed/
http://configmgrblog.com/2012/01/04/managing-mobile-devices-in-configuration-manager-2012-via-exchange-online-1/

I used tips from the guides but overall it was an easy task. Here are my steps:

ex_server_connector_sccm_accounts

Fig 3. Accounts in SCCM

  • Started SCCM, then navigated to Administration –> Overview –> Hierarchy Configuration –> Exchange Server Connectors

ex_server_connector1

Fig 4. Exchange Server Connector.

  • Added a new connector with the default values. Properties from the newly created connector below:

ex_server_connector2

Fig 5. Properties, General

Note: There are problems with the URL if using load balancers. I had to change the URL to one of our CAS servers (and not pointing to the single namespace/autodiscover URL in DNS). Check the problems and gotchas-chapter below for more details.

 

ex_server_connector3

Fig 6. Properties, Account

 

ex_server_connector4

Fig 7. Properties, Discovery

 

ex_server_connector5

Fig 8. Properties, Settings

If you change a setting here, that setting will be changed from Configured by Exchange Server to Configured by Configuration Manager from now on. In other words, you are giving the SCCM server authority to handle these settings instead of Exchange. Also note the “Allow external mobile device management”: xxxxx” –option, and read the text above it. I changed mine to Allowed.

 

ex_server_connector6

Fig 9. Properties, Access Rules

 

Problems

Theoretically everything should now be set up and working. Unfortunately, that wasn’t the case for me. I immediately noticed that no devices showed up under “Devices/All Mobile Devices” in SCCM. I had configured all steps correctly, and SCCM didn’t complain. Luckily there are logs (EasDisc.log on the SCCM server) so you can have a better understanding what’s going on behind the scenes. That said, I noticed some problems in the log straight away:

ex_server_connector_error

Fig 10. EasDisc.log: the problems

Some googling led me to https://social.technet.microsoft.com/Forums/en-US/e7ca3f0c-a793-4437-8050-2de4c9d9253c/exchange-connector?forum=configmanagergeneral. Someone had a similar setup and suggested using the FQDN of one of the CAS servers instead of the NLB URL. Tried that – success! 🙂 (almost…)

ex_server_connector_error_solved

Fig 11. EasDisc.log: problem solved, everything looks good. Log also reported INFO: Total number of devices discovered 357       SMS_EXCHANGE_CONNECTOR        x.x.2015 11:57:48 which is not visible in the screenshot.

 

View from SCCM

Let’s have a look at the whole thing in action from SCCM:

ex_server_connector_sccm_view_devices

Fig 12. All Mobile Devices.

 

ex_server_connector_sccm_all_mobile_devices

Fig 13. Another view

 

Gotchas

Everything APPEARED to be working fine now. After a while I noticed it wasn’t. I configured a test-device with my own account, but it DIDN’T show up in Assets and Compliance –> Overview –> Devices –> All Mobile Devices in SCCM (Fig 11). However the list with All Mobile Devices (Fig 10) got updated (correct number of devices). Very strange.

Some head scratching and googling later I ended up at https://social.technet.microsoft.com/Forums/en-US/6a6dae36-a84c-4f7b-8fd5-7e24d905ec6f/sccm-2012-exchange-connector-to-cas-through-load-balancer?forum=configmanagergeneral

Well, well, well. Problem with load balancers. Duh. My solution: Added another connector for our second CAS. Well, that didn’t work. It was still showing the same amount of devices 😦 My test-device wouldn’t show up either. It was now unfortunately time to state that the Exchange Connector won’t work if you have more than one CAS in your environment. Too bad 😦

Update: Currently I’m using an EAS device report script on the Exchange server for collecting miscellaneous information about mobile devices. More on that in a blog post later on…

 

Search queries in SCCM

(Even though the connector didn’t work as expected, I had already made a couple of queries before noticing the problem…)

It’s always nice to get a list of devices, but in most cases you’ll want to have the list sorted in some way. I was requested to sort our list by the Windows Phone OS. I used a slightly modified query from: http://www.windows-noob.com/forums/topic/9618-unified-device-management-with-configuration-manager-2012-r2-part-4-configuring-compliance-on-ios-devices/

Query:

select SMS_R_SYSTEM.ResourceID,SMS_R_SYSTEM.ResourceType,SMS_R_SYSTEM.Name,SMS_R_SYSTEM.SMSUniqueIdentifier,SMS_R_SYSTEM.ResourceDomainORWorkgroup,SMS_R_SYSTEM.Client from SMS_R_System where SMS_R_System.OperatingSystemNameandVersion like “%Windows Phone%

Using this query, I got all Windows Phones listed:

ex_server_connector_sccm_wp8_query

Fig 14. Query for Windows Phones

Instead of using Reporting, I find it much easier to just mark the whole list and copy/paste it into Excel (or another document). Some sort of “export to .csv” right-click plugin for SCCM would be awesome though.

Load Balancing Exchange 2013 (CAS) with clustered (Zen) Load Balancers

I decided to skip my post about Exchange Database Availability Groups (DAGs), as all the information needed was already very well documented. All I did was following the excellent guide from exchangeserverpro.com, Installing an Exchange Server 2013 Database Availability Group. I got my DAG up and running in no time. We already have a working DAG environment up and running as well.

The part which needed some extra attention was High Availability/Load Balancing, mainly load balancing the CAS. (DAGs are sort of high availability/failover tolerant by design, but if the CAS go down they’re rather useless).

UPDATE 22.2.2017: I published a new blog post about Using FarmGuardian to enable HA on Back-ends in Zen Load Balancer. Use this as a compliment to this guide.

I’ll start off by making illustrations of the current and soon-to-be situations.

Current situation

 

exchange_current_setup

Fig 1. Current setup

  • 1 Exchange server used as CAS proxy/redirect.
    • No user mailboxes
    • Not actually needed, we could also use dns round robin. The server was meant to replace the Exchange 2010 server at first… then change of plans, not going into any details here.
    • Will be taken out of production and replaced by Zen Load Balancing cluster
    • Single point of failure
  • 1 server running Exchange 2010. Existing users will be migrated from this server to the (two) Exchange 2013 servers real soon. (When certificate issues are fixed).
    • When this is done, the server will be taken out of production.
  • 2 Exchange 2013 servers, running in two different physical locations (even though in same domain).
    • DAG is used between the two servers
    • All users will be moved to these two servers
  • Single namespace

 

Soon-to-be situation

Goals:

  • No (CAS) single point of failure
  • Only Exchange 2013 servers in the environment

 

exchange_soon_to_be_setup

Fig 2. Soon-to-be setup

  • 2 Clustered Zen Load Balancers (in different physical locations)
  • 2 Exchange 2013 servers (the existing ones from Fig 1)
  • Single namespace (exzen)

This whole setup might seem a bit small, but at the moment it’s sufficient. We’ll expand when the need is there. Only calendars and contact information is currently stored on the exchange servers, email is handled by our 3rd party imap server.

NOTE! If you are using VMware in your environment, be sure to check this information:

https://keepingitclassless.net/2013/04/virtual-routing-part-2-fhrp-issues-in-vmware-vsphere/

It will save you time and nerves when trying to figure out why replication isn’t working.

 

Installing Zen Load Balancing Cluster

Introduction

I was put to the task of finding a good load balancing solution for Exchange. In its most simple configuration you could just use dns round robin (http://exchangeserverpro.com/exchange-2013-client-access-server-high-availability/), but I wasn’t too convinced about this idea. Round robin seemed more like the poor mans load balancer. That said, it will work. I even gave it a go in my test environment. The clients got a little bit (too) confused though. Not good. I decided to move along to a “real” load balancer.

Luckily a Layer 4 solution will work fine with Exchange 2013, so there’s no need for a more complex Layer 7 solution. Keeping it simple is the key. Some facts:

“In Exchange Server 2010 it’s often said that the main skill administrators needed to learn when was how to deploy and manage load balancing. The concept of the RPC Client Access Array, the method used to distribute MAPI traffic between Client Access Servers was a common area of pain. Modern advances in Layer 7 load balancing also allowed for SSL offload, service level monitoring and load balancing and intelligent affinity using cookies to mitigate against some of Exchange 2010’s shortcomings.”

Improvements to Load Balancing

What we’re getting at is the two key improvements in Exchange 2013 that make load balancing suddenly quite simple. HTTPS-only access from clients means that we’ve only got one protocol to consider, and HTTP is a great choice because it’s failure states are well known, and clients typically respond in a uniform way.”

Source: http://www.msexchange.org/articles-tutorials/exchange-server-2013/high-availability-recovery/introducing-load-balancing-exchange-server-2013-part1.html

That said, there are many load balancing solutions available out there. Windows Network Load Balancing (WNLB) is one example that comes to mind, but it has limitations:

“WNLB can’t be used on Exchange servers where mailbox DAGs are also being used because WNLB is incompatible with Windows failover clustering. If you’re using an Exchange 2013 DAG and you want to use WNLB, you need to have the Client Access server role and the Mailbox server role running on separate servers. “

Source: https://technet.microsoft.com/en-us/library/jj898588%28v=exchg.150%29.aspx

That’s no good. After some investigation I found a webpage that compares open-source load balancers, http://wso2.com/library/articles/2014/03/wso2-elb-vs-other-open-source-load-balancers/. This was promising; Linux-based load balancers that not only use little resources in our data center – they’re free as well. Brilliant.

I started out with testing HAProxy. I had heard some good things about it. Well… short story: It turned out to be a bit of pain to configure. I didn’t want to waste all of my energy on configuring.

Next candidate: Zen Load Balancer. Oh, what a difference. Very easy to install, very easy to configure. A nice web interface from which you can configure everything needed. There are many commercial alternatives available as well, but Zen seemed to come very close to these (on the open-source market).

 

Preparations

Be sure to use a single namespace in Exchange 2013. This is actually best practise even without a load balancer, and it makes your life easier. If you have no idea what I’m talking about please have a look at the following links for example:

http://3techies.com/?p=194
http://www.msexchangegeek.com/inside-the-exchange-2013-single-namespace-part-1/
http://blogs.technet.com/b/exchange/archive/2014/02/28/namespace-planning-in-exchange-2013.aspx
http://blog.netwrix.com/2014/03/21/configuring-exchange-2013-for-site-resilience-2/

While you’re at it, have a look at managing certificates at the same time: http://www.msexchange.org/articles-tutorials/exchange-server-2013/management-administration/managing-certificates-exchange-server-2013-part1.html. We’re using a SAN certificate for autodiscover.ourdomain.com and exzen.ourdomain.com.

DNS: Instead of having clients pointing to the CAS, point your clients to the Load Balancer. (You then configure the load balancer with the real IP’s of the back-end exchange servers). More info in the next chapter.

 

Installation

 

Pictures say more than words, so here you go:

zen_global_view

Fig 3. Zen Load Balancer Global View

 

zen_farms

Fig 4. Zen Farms

 

zen_backend_status

Fig 5. Zen backend status

 

zen_interfaces

Fig 6. Zen Interfaces. Both Zen servers have dual NIC, one dedicated to the cluster service (eth1).

Zen 1:

  • Physical IP (eth0): 10.0.0.60
  • Virtual IP (virtual network interface): 10.0.0.61
  • Physical IP (eth1): 10.0.0.80
  • Virtual IP (used for cluster service): 10.0.0.85

Zen 2:

  • Physical IP (eth0): 10.0.0.70
  • Virtual IP (virtual network interface): 10.0.0.61
  • Physical IP (eth1): 10.0.0.90
  • Virtual IP (used for cluster service): 10.0.0.85

 

zen_rsa_between_hosts

Fig 7. RSA communication between cluster hosts

 

zen_cluster1

Fig 8. Zen Cluster configuration – success (with failover).

 

zen_master_node

Fig 9. Cluster status – master

 

zen_backup_node

Fig 10. Cluster status – backup

 

DNS settings:

exchange_zen_dns

Fig 11. DNS settings

 

Outlook connection status:

outlook_connection_status

Fig 12. Outlook connection status. Connected to exzen, (which is pointed to Zen Load Balancer in DNS).

 

Testing failover

Failover is luckily easy to test in a virtual environment as you can just suspend a virtual machine 🙂 I did a test run (suspended zen1) while constantly pinging the virtual IP and having an eye on the Outlook connection status window. Here’s the result:

http://youtu.be/DLCMVVw2tN4

  • At 00:11 the ping reply stops for a brief moment (when I hit suspend on zen1)
  • At 00:15 the client is noticing that the proxy server is down/suspended (ping request times out)
  • At 00:26 ping reply to the proxy server is active again (automatic failover to zen2/”backup”)
  • At 00:40 I resume/re-activate zen1 again
  • At 00:42 Outlook notices this (established/connecting changes state in Outlook Connection Status)
  • At 00:45 the ping reply stops for a brief moment (when zen1 is resuming/getting back online after the “fail”)
  • At 00:46 ping reply times out (zen1 is configuring itself to become the “master” again)
  • At 00:55 all connectivity from Outlook to the proxy is temporarily lost
  • At 00:58 the proxy server answers to ping requests again
  • At 01:05 the connection to the proxy server is active/established in Outlook
  • At 01:06 –> everything is back to normal.

To sum it up: Outlook is only offline for 10 seconds (00:55 – 01:05), and it newer even complains about being offline (no yellow exclamation mark). Not bad. 10 seconds of downtime is something we definitely can live with.

 

Final words

As you can see, everything is working nicely in my test environment. Before going production there are a couple of things that should be considered:

  • Certificate for the Zen Load Balancer Web Interface. Is it needed or blocked from the outside world?
    • other security considerations on the load balancer(s) such as open ports etc. Networking team will decide this.
  • After the cluster is up and running in production, test it only on a couple of clients at first.
  • Probably lots more I can’t think of right now…

Test Lab Guide (with modifications): Configure an Integrated Exchange 2013, Lync 2013 and SharePoint 2013 Test Lab

I recently left my old position and started working at our University’s Computing Centre. This also meant changes to my job assignments. I’m now deep diving into Exchange, Lync and SharePoint. All of this will of course take (a lot of) time and I decided to start from scratch with a Test Lab Guide (TLG) – Test Lab Guide: Configure an Integrated Exchange, Lync, and SharePoint Test Lab. (No need to break peoples calendars just yet 🙂 ) This TLG will be the base for all my testing from now on so it’s important to get it working properly. I got the basics up ‘n running quite fast, but then more and more trouble arose. I followed the guide to the letter, but to no avail. Google was short on answers, so the problems needed to be split up into smaller chunks. The main problem was to configure cross-product integration with all the servers. In order for the Exchange, Lync, and SharePoint servers to participate in cross-product scenarios and solutions, they must be configured to trust each other through server-to-server authentication trusts (OAuth). There’s a script (https://technet.microsoft.com/en-us/library/jj204975.aspx) for this, but it didn’t work for me 😦 (Well, might actually work better now when I have better basic understanding of what the script do. It probably also works better now that the certificates are configured correctly).

I got lots of error 401 and/or SSL errors, for example:

Cannot acquire auth metadata document from ‘https://sp1.corp.contoso.com/_layouts/15/metadata/json/’. Error: The remote server returned an error: (401) Unauthorized)

Cannot acquire auth metadata document from ‘https://sp1.corp.contoso.com/_layouts/15/metadata/json/’. Error: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.

After much digging around I came to the conclusion that it has to do with the servers’ certificates. I had certificates set up for auto-enrolment, but they somehow didn’t get auto-enrolled on the servers. A small detail, perhaps, but it took me forever to figure this out. I had to manually request the AD published certificates (https://technet.microsoft.com/en-us/library/cc730689.aspx). It’s like begging for trouble playing around with self-signed certificates in an environment like this so I’m glad I got it sorted out.

The problems didn’t disappear even though I was using certificates signed from the Domain CA. The https binding in IIS defaults to whatever it feels like, so you have to change the https site binding to the certificate issued by your CA. Information about IIS site bindings can be found here http://www.orcsweb.com/blog/mark-newnam/how-to-set-up-site-bindings-in-internet-information-services-iis/ or here http://blogs.technet.com/b/chrad/archive/2010/01/24/understanding-iis-bindings-websites-virtual-directories-and-lastly-application-pools.aspx for example. After this was done everything was already much better.

Still, the script wouldn’t work. Step by step I decided to manually try stuff from the script instead. After much fiddling I got it working (don’t even remember how anymore, but it was a lot of trial and error). I did at least the following things (scroll backs in PowerShell and memory dumps from my head):

 

On the Exchange server (the first server that got all the server-to-server trusts working):

ex_lync_partnership

Fig. 1. Partner with Lync

ex_sp_partnership

Fig 2. Partner with SharePoint

 

Checking partnership with Get-PartnerApplication:

ex_get-partnerapplication

Fig 3. Get-PartnerApplication

Everything OK!

Source: https://technet.microsoft.com/en-us/library/jj649094%28v=exchg.150%29.aspx

 

On Lync server:

lync_ex_partnership

Fig 4. Partner with Exchange

lync_sp_partnership

Fig 5. Partner with SharePoint

 

Checking partnership with  Get-CsPartnerApplication:

lync_get-cspartnerapplication

Fig 6. Get-CsPartnerApplication

Everything OK!

Source: https://technet.microsoft.com/en-us/library/jj205253.aspx and https://technet.microsoft.com/en-us/library/jj204975.aspx (This was the failing script for me, so I did it in stages as in the screenshots above). Many more sources also which I can’t remember…

 

On SharePoint server:

sp_ex_partnership

Fig 7. Partner with Exchange (never mind the error, the partnership was already done).

sp_lync_partnership

Fig 8. Partner with Lync

 

Checking Get-SPTrustedSecurityTokenIssuer:

sp_get-sptrustedsecuritytokenissuer

Fig 9. Get-SPTrustedSecurityTokenIssuer

Seems OK! The permissions on the SharePoint 2013 server was already set up at an earlier stage:

At the Windows PowerShell command prompt, type the following commands:

$exchange=Get-SPTrustedSecurityTokenIssuer
$app=Get-SPAppPrincipal -Site http://<HostName> -NameIdentifier $exchange.NameId
$site=Get-SPSite http://<HostName>
Set-SPAppPrincipalPermission -AppPrincipal $app -Site $site.RootWeb -Scope sitesubscription -Right fullcontrol -EnableAppOnlyPolicy

Source: https://technet.microsoft.com/en-us/library/jj655399.aspx, https://technet.microsoft.com/en-us/library/jj670179.aspx

 

Well, I learned a lot yet again. I had to dig into much other stuff as well, but at least it was easily done with Google. The main problems were certificates and server-to-server trust issues. The TLG itself was very nicely written, although it didn’t work as expected for me. None the less everything is now set up and working so I can continue doing all kinds of tests. This test-environment will help me A LOT on my journey with Exchange, Lync and SharePoint. Wish me luck, I know I’m gonna need it 🙂

My next experiment will be to add another exchange server (or two) and use Database Availability Groups (DAGs). (Actually already done using the excellent guide at http://exchangeserverpro.com/exchange-server-2013-database-availability-groups/)

I’ll also be looking at High Availability for the CAS. Stay tuned!

More useful sources (out of the millions I already found):

http://memphistech.net/?p=280
https://digitalbamboo.wordpress.com/2013/09/24/setting-up-exchange-unified-messaging-with-lync-2013-integration-for-voicemail/
http://blog.insidelync.com/2012/08/the-lync-2013-preview-unified-contact-store-ucs/
https://mchahla.wordpress.com/2013/01/12/integrating-lync-server-2013-exchange-server-2013-owa/