Category Archives: High Availability

Understanding Lync 2013 Branch Site Voice Mail Survivability

Recently I was working on a customer’s global roll-out of Lync 2013 and we were designing how their branch sites would work from a voice perspective. Each branch site would have an SBA for voice connectivity to the legacy PBX/ISDN and for survivability, but we also had to map out how voice mail would work in the event of WAN failure.

Here’s how things would look like in a normal scenario with the WAN up:

vm reroute

With the Exchange Unified Messaging servers being in the central data centre along with the main Front End pool, this meant we needed to configure voice mail survivability. This is a feature Lync 2013 provides out of the box, but not one I knew a lot about or how to configure it. Outlined at the bottom of the TechNet Library article Branch-Site Resiliency Requirements, we can determine how Lync still provides voice mail deposit and retrieval in the event that the SBA cannot contact the central Exchange UM server/s. Continue reading

Video – Lync Server 2013 Database Mirroring and Pool Pairing

As part of my preparation for TechEd 2013, I recorded my demos using Camtasia as a backup in case the internet went down during my breakout sessions. Fortunately it didn’t, and I was able to do my demos live, but I did still have the videos on hand. So rather than deleting them, I thought it’d be a good idea to upload these for reference. Hope they’re helpful.

Database Mirroring in Lync Server 2013

This demo shows how database mirroring is configured, how to fail over databases manually and how mirroring works during an unplanned database failure. Apologies that the quality isn’t fantastic in this one.

Pool Pairing in Lync Server 2013

In this video, I demo how pool pairing is configured to provide disaster recovery, how you can failover users from one pool to another and what the user experience looks like.

SQL Database Mirroring with Lync Server 2010 Series – Group Chat

In previous posts in this series on Lync Server 2010 and SQL Mirroring I’ve covered the prerequisites, how to failover your SQL databases and how the backend Lync databases behave once you’ve failed over. In this final post, I’m going to cover how Group Chat behaves when it’s database is mirrored and subsequently failed over.

As with my other posts regarding SQL mirroring and Lync Server 2010, I must stress that this is completely unsupported by Microsoft. I attempted this deployment scenario purely as a “could it be done” exercise only. Do not take this as gospel and do not deploy it in a production environment.

Preparing the DR scenario

So the first thing we need to do is get everything setup to support Group Chat in the DR site. This involves mirroring the Group Chat database between the Principal and Mirror SQL server nodes in each site and ensuring the SQL mirror node is setup to allow a Group Chat server to connect to it.

Mirroring the Group Chat database and preparing the SQL Mirror node

I covered this in the prerequisites post, so we can assume that we’ve followed the steps to mirror the Group Chat database already.

But before we can failover the Group Chat database, we need to make sure the mirror node is setup so the Group Chat Server can connect to it. To do this, we need to make sure security is setup on the SQL server mirror node to support Group Chat. TechNet covers the steps required to do this throughly in this article, so all your need to do is follow the steps listed on your SQL mirror node.

Setting up a standby Group Chat server

The first caveat to throw in the mix here is that to bring up Group Chat in the DR site, you mustn’t actually have the Group Chat server setup already. This is because Group Chat topologies come in either single or multiple server flavours, so if it were already part of the topology it would be used by users. In this scenario, we want to keep this server as a standby and not used when the production site is online.

The server you’ll be using for Group Chat will need to be a cold standby machine in DR that already has a certificate issued to it for its FQDN and the Group Chat prerequisites installed. Once this is setup, it’s ready to have Group Chat installed on it when you need to activate your DR plan.

Activating Group Chat in the DR site

So once you’ve failed over your database and made the SQL mirror node the Principal, it’s time to get Group Chat up and running again in your DR site. To do this, you essentially need to remove the Group Chat server you had in the primary site because a 1:1 to relationship exists between a Group Chat server/pool and a Lync Server 2010 registrar (Standard Edition server or Enterprise Edition pool). Additionally, the server we’ll have prepared in the DR site will have a different server name.

Cleaning the Lync Topology

Firstly we need to remove the old Group Chat server from the Lync topology. This involves running the following cmdlets:

Remove-CsTrustedApplicationPool -identity <FQDN of failed Group Chat server>
Remove-CsTrustedApplication -identity <FQDN of failed Group Chat server> 

These cmdlets will remove references to the failed Group Chat server from your Lync topology, allowing you to spin up a new Group Chat server in the DR site. If we leave the old Group Chat server in the topology, when we try to add a new server in DR it picks up this configuration automatically and we can’t define a new next hop pool.

Making a hosts file change

Next we need to make a local hosts file change so that the FQDN of the principal SQL server resolves to the IP address of the mirror SQL server in the DR site.

We need to do this because although the Group Chat services are somewhat SQL mirroring aware (I witnessed a connection from my Group Chat server to my SQL mirror node on TCP port 1433 after I failed over), the good news doesn’t last and eventually the Lookup and Channel services stop functioning. Modifying the hosts file overcomes this and the services stay started.

Install Group Chat in the DR site

Now that we’ve cleared references from the topology and our hosts file is modified, we can begin the install of Group Chat in the DR site by installing Group Chat using the Installation Wizard as normal:

  1. When prompted to provide a SQL server and instance name, provide the principal SQL server name and the Group Chat database name. Once we commit this configuration, it will write new rows to the tbl.Config table in the Group Chat database to allow this Group Chat Lookup/Channel server to work.
  2. When prompted at the next screen, assign the previously imported certificate to the new Group Chat Server.
  3. Following this screen you will see the next hop address and site. These will all be greyed out because the tbl.Config database has already been populated with them. We’ll fix this in step 5.
  4. Once deployment has completed, ensure all Group Chat services start. If some services fail to start, consult the local Lync Server event log and troubleshoot SQL database access.
  5. Next, using the Server Config Tool, set the next hop Lync pool that Group Chat is configured with to the FQDN of the DR Lync Standard Edition server or Enterprise Edition pool. This will modify the tbl.Config table for us in the SQL database.

Once you’ve got Group Chat installed, you will want to run Get-CsTrustedApplicationPool to ensure the Lync topology has been updated with the new Group Chat server details so your FE pool in DR can talk to it.

Validation and Summary

Once you’ve followed the steps above, you’ll want to fire up your Group Chat client and attempt to sign in. If you can’t sign in straight away, check that your Group Chat services are started ok and that a Trusted Application pool exists in Lync for your DR Group Chat server.

This post (and the series) has really been a “proof of concept” exercise only, but it does show that Group Chat can utilise a mirrored SQL database, with some pretty decent configuration change. As I said, don’t use this blog post as verification when making a design decision around your DR requirements for Lync.

Thanks for following along, I hope this series has helped you understand the challenges around SQL mirroring in Lync in more detail and why Microsoft doesn’t support it.

SQL Database Mirroring with Lync Server 2010 Series – Backend Databases

In the previous part of this series, I talked about how we fail over our databases. In this part, I’ll cover what happens to Lync after the failover.
The objective of this exercise was to attempt to get our Lync Enterprise Edition Pool to function when we took the primary SQL server offline and pointed Lync to the SQL mirror node where our back end databases had been mirrored to.

Trying to get it to work

I found that after failing over, the SQL Native Client on the Front End servers was Database Mirroring aware and had automatically redirected itself to the mirror node of SQL02. This was confirmed when I saw a successful automatic connection to SQL02 on port 1433 from the Front End server (FE03) that we’d failed over to.

However the Lync Server Front End service is not Database Mirroring aware, and we saw loads of errors and warnings in the Lync Server event log telling us that a connection to the SQL server could not be made as it was consistently attempting to connect to SQL01 (the failed server) that is defined in the Lync topology.

When I attempted to sign in using Lync, all we could get was the “Limited Functionality Mode” experience with no presence or contact list.

At this stage, I started to think about how we could get this to work, and changed the DNS record for SQL01 to resolve to the IP address of SQL02, which didn’t help. I even changed the IP address of SQL02 to that of SQL01, which still didn’t result in a successful connection. Everything I tried to get SQL02 to look like SQL01 wouldn’t work.

How to achieve basic client functionality

Now that we’ve discovered that Lync doesn’t support Database Mirroring natively nor do basic DNS changes help, I worked out that we needed to do to get the Lync FE services to start properly and to make a database connection to the SQL mirror node.

  1. Delete the computer account of the principal SQL server (SQL01) from Active Directory.
  2. Modify the hostname of the mirror SQL Server (SQL02) to that of the failed, principal SQL Server (SQL01).
  3. Restart the mirror SQL server and verify that all SQL services are started.
  4. Modify the DNS A record for the failed principal SQL server so that it resolves to the IP address of the mirror SQL server.
  5. Restart the Lync Front End server and confirm that the Front End service is started.
  6. Verify signing in with the Lync client and that you receive the full client experience and not “Limited Functionality Mode”.

As you can see, this is by no means a “High Availability” solution, as it means computer name and AD changes which rules this out as being an acceptable “quick failover” scenario and even rules it out as a Disaster Recovery solution.

Because Lync Server is inherently secure, it means we cannot connect an FE server to a back end SQL server that isn’t defined in the topology.

Why is it not supported?

One reason why it’s not supported is that there is a highly dependent and integrated relationship between the rtc andrtcdyn databases.

Each database refers to the other to maintain a consistent presence state to the user of their contacts, and of their presence to their contacts. The rtc database is mostly static, holding publications of users, contact lists, privacy relationships etc whereas the rtcdyn database is dynamic, holding state about which front end in the pool holds a SIP endpoint and what active subscriptions are bound to the endpoint.

Because database mirroring is configured and managed at an individual database level, in the event of failure OCS/Lync cannot guarantee both databases are in the consistent state required to maintain the required quality user experience.

Summary

To summarise, when the failover process is executed, the following behaviour was observed:

  • Initially, the Lync client signs in but is in “limited functionality mode”. This is because the Lync Front End service cannot communicate with the back-end database.
  • Front End Service starts but events are logged advising that a back-end database connection cannot be made.
  • It was concluded that due to inherent product design, the Lync Front End service will not connect to a SQL server that is not defined in the Lync Topology.
  • It was only after we deleted the computer account for the failed principal SQL server from Active Directory and renamed the mirror SQL server did errors stop appearing in the Application Log and the Lync client function properly.

What about Group Chat?

In the next instalment, I’ll talk more about what happens to Group Chat when we failover the databases to the SQL mirror node. Make sure you subscribe to get updated!

SQL Database Mirroring with Lync Server 2010 Series – Failover

In the first part of this series covering SQL Database Mirroring and Lync Server 2010, I covered a lot of the prerequisites required to establish this deployment scenario. I also ran through getting the Database Mirroring side of things setup.

In this next part, I’ll cover actually simulating a failure of our SQL Server. There are a few T-SQL queries we need to run first, followed by actually failing over the SQL database server node and verifying that the failover was successful.

Failing over to another SQL Server

The failover process was scoped so that it simulated a production data centre failure as close as possible. The process consisted of the following steps for each database mirroring scenario.

Prerequisites to Failover

  1. First, open a Command Prompt and navigate to C:\Program Files\Common Files\Microsoft Lync Server 2010\Support. Run the following command to backup all user data (e.g. contact lists, privacy relationships, conference directories) using dbimpexp.exe:dbimpexp.exe /export /hrxmlfile:location of backup file /sqlserver:name of SQL server /restype:all.
  2. Next, ensure the mirroring status is healthy and that the Principal server node (SQL01) reports each database as Principal, Synchronized:

     and that the mirror server node (SQL02) reports each database as Mirrored, Synchronizing:
  3. To track synchronous replication, timestamp tables should be created in each database using the following Transact SQL commands.

CREATE TABLE dbname1.dbo.tblDate (dtDate datetime)
CREATE TABLE dbname2.dbo.tblDate (dtDate datetime)
CREATE TABLE dbname3.dbo.tblDate (dtDate datetime)
CREATE TABLE dbname4.dbo.tblDate (dtDate datetime)
CREATE TABLE dbname5.dbo.tblDate (dtDate datetime)
CREATE TABLE dbname6.dbo.tblDate (dtDate datetime)

  1. Next, run the following command to insert timestamp data into the table every second so replication can be tracked:

SET NOCOUNT ON
WHILE 1 <> 2
BEGIN
INSERT INTO dbname1.dbo.tblDate (dtDate) (select GETDATE())
INSERT INTO dbname2.dbo.tblDate (dtDate) (select GETDATE())
INSERT INTO dbname3.dbo.tblDate (dtDate) (select GETDATE())
INSERT INTO dbname4.dbo.tblDate (dtDate) (select GETDATE())
INSERT INTO dbname5.dbo.tblDate (dtDate) (select GETDATE())
INSERT INTO dbname6.dbo.tblDate (dtDate) (select GETDATE())
WAITFOR DELAY00:00:01
END

In each T-SQL command, replace the field dbname1 with the name of the database you wish to run the command against. Now we’re ready to failover the databases.

Simulating a Server Failure and Failing Over the Databases

You can do this in two ways – either using the Failover button in the Database Mirroring GUI or use the following method to simulate a complete server failure:

  1. Force-power off the Lync Front End and back-end SQL servers, or verify that these servers have failed.
  2. After the decision has been made to activate the disaster recovery process, log onto the SQL Server mirror node (SQL02) and run the following SQL queries to force failover of databases.
    1. Check that the database is in synchronous mirroring mode:
      ALTER DATABASE dbname SET SAFETY FULL;
      GO
    2. Failover the database (this is an unplanned database failover, performed when the Principal is offline, which may result in data loss, hence the command name):
      ALTER DATABASE dbname SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS;
      GO

Replace the field dbname with the name of each database.

  1. After failing over the database, verify that the Lync databases are in “Principal, Synchronized” mode on the mirror node (SQL02).

At this point, we can check the Front End (FE) service on our Front End servers and test signing in with the Lync client.

How Lync Server Behaves

In the next instalment, I’ll talk more about what weird behaviour was observed and how I tried to get Lync to use the SQL mirror node. I encountered loads of problems, but eventually got the Lync Front End service to use the SQL server we’d failed the databases over to. Make sure you subscribe to get updated!

SQL Database Mirroring with Lync Server 2010 Series – Prerequisites

I gave you guys a heads up on this a few weeks ago that I’d looked into this in some fairly significant detail and found some interesting behaviour when you attempt to failover after deploying Lync Server 2010 backend and Group Chat databases using SQL Server Database Mirroring. Now that I’ve completed this work, I can share the results with you in a multi-part blog series.

Questions have always been asked over the years as to why organisations can’t use SQL Database Mirroring as a lower cost alternative to SQL Failover Clustering and the only advice we’ve been able to give is what TechNet documentation provides:  “Lync Server 2010 does not support native database mirroring“.

Based on the work I’ve done, I can tell you what happens when you attempt to failover, what you have to do to actually get to a “recovered” situation and reasons why you won’t want to use SQL Database Mirroring with Lync.

The Supported Scenario Today

So today the official supported scenario is to deploy your Lync Server 2010 Enterprise Edition backend and Group Chat databases in one of two ways:
  • On a standalone SQL Server with no resilience.
  • On a highly available SQL Server two node Failover Cluster.
    • This is either for a local, in-site SQL cluster instance for server resiliency only.
    • Or a cross-site, metro data centre SQL clustered instance with one node in each site (and SAN replication, low latency etc) for site resiliency.
The latter solution giving you a resiliency solution with failover you know works for the entire instance.

Why SQL Database Mirroring?

Database mirroring was a new feature delivered as part of SQL Server 2005. The data replication occurs at an individual database level rather than at an instance level in clustering, providing greater flexibility at the expense of a higher management/configuration overhead. Unlike clustering, the system databases e.g. master db cannot be replicated using Database Mirroring. This means that you need to recreate logins, security etc manually on the mirrored node.

The mirroring process transports logs over its own TCP/IP session on port 5022 and uses compression by default. An important point to note is that SQL Server Standard Edition is a single threaded operation whereas SQL Server Enterprise Edition is a multi-threaded mirroring process.

One of the greatest benefits of database mirroring is its flexibility. An example of this is that an administrator can effortlessly switch principal and mirror roles back and forth, failing over between the two nodes. Plus there is no dependency on identical/near-identical hardware, disk, heartbeat networks etc that make traditionally make clustering a hinderance. Think of it like SCR was to Exchange Server 2007.

Compelling Reasons for Database Mirroring

I can definitely see why an organisation would want to utilise mirroring rather than clustering, especially when attempting to design site resiliency. The top reasons being that it is:

  • Cheaper.
  • Represents less infrastructure complexity.
  • Better for site resiliency as there are no tight network and storage requirements. <— No 1 reason.

Test Environment

As part of this investigation, I deployed the following machines to test this configuration:

  • One Lync Server 2010 Enterprise Edition pool consisting of two Front End servers (named FE01 and FE02).
  • Two Lync Server 2010 Group Chat servers (GC01 and GC02).
  • Two SQL Server 2008 R2 servers fulfilling the principal (named SQL01) and mirror (named SQL02) roles for Database Mirroring.

Lync and SQL Configuration

After deploying my servers, I got them ready to define the topology and install Lync. I’m not going to go into loads of detail on the configuration of SQL Database Mirroring here because it’s well covered on TechNet and it’s all GUI driven. I completed the database parts in the following way:

  1. Specified the location for the back end databases for the Front End pool as the principal SQL server (SQL01) in Topology Builder and published.
  2. Setup my Group Chat database and permissions on SQL01 and specified this server in the Group Chat Installation Wizard.
  3. Ensured the Lync backend databases were successfully deployed onto this server.
  4. Verified all sign-in functionality with both the Lync and Group Chat clients.
  5. Configured SQL Database Mirroring using the SQL Server 2008 R2 Management Studio GUI to mirror all databases to SQL02.
  6. Verified that all databases were in a Principal, Synchronized status on SQL01 and Mirrored, Synchronizing on SQL02.

The Next Chapter

Once we’re at this stage, we’re good to start failing the databases over to the mirror node to see how Lync behaves.

Be sure to subscribe and come back for the next part where it gets interesting. Services started failing over the shop, TCP connections were being automagically redirected to the SQL mirror node and there’s lots of the Lync client being in “limited functionality mode”. Stay tuned. 🙂