In the previous part of this series, I talked about how we fail over our databases. In this part, I’ll cover what happens to Lync after the failover.
The objective of this exercise was to attempt to get our Lync Enterprise Edition Pool to function when we took the primary SQL server offline and pointed Lync to the SQL mirror node where our back end databases had been mirrored to.
Trying to get it to work
I found that after failing over, the SQL Native Client on the Front End servers was Database Mirroring aware and had automatically redirected itself to the mirror node of SQL02. This was confirmed when I saw a successful automatic connection to SQL02 on port 1433 from the Front End server (FE03) that we’d failed over to.
However the Lync Server Front End service is not Database Mirroring aware, and we saw loads of errors and warnings in the Lync Server event log telling us that a connection to the SQL server could not be made as it was consistently attempting to connect to SQL01 (the failed server) that is defined in the Lync topology.
When I attempted to sign in using Lync, all we could get was the “Limited Functionality Mode” experience with no presence or contact list.
At this stage, I started to think about how we could get this to work, and changed the DNS record for SQL01 to resolve to the IP address of SQL02, which didn’t help. I even changed the IP address of SQL02 to that of SQL01, which still didn’t result in a successful connection. Everything I tried to get SQL02 to look like SQL01 wouldn’t work.
How to achieve basic client functionality
Now that we’ve discovered that Lync doesn’t support Database Mirroring natively nor do basic DNS changes help, I worked out that we needed to do to get the Lync FE services to start properly and to make a database connection to the SQL mirror node.
- Delete the computer account of the principal SQL server (SQL01) from Active Directory.
- Modify the hostname of the mirror SQL Server (SQL02) to that of the failed, principal SQL Server (SQL01).
- Restart the mirror SQL server and verify that all SQL services are started.
- Modify the DNS A record for the failed principal SQL server so that it resolves to the IP address of the mirror SQL server.
- Restart the Lync Front End server and confirm that the Front End service is started.
- Verify signing in with the Lync client and that you receive the full client experience and not “Limited Functionality Mode”.
As you can see, this is by no means a “High Availability” solution, as it means computer name and AD changes which rules this out as being an acceptable “quick failover” scenario and even rules it out as a Disaster Recovery solution.
Because Lync Server is inherently secure, it means we cannot connect an FE server to a back end SQL server that isn’t defined in the topology.
Why is it not supported?
One reason why it’s not supported is that there is a highly dependent and integrated relationship between the rtc andrtcdyn databases.
Each database refers to the other to maintain a consistent presence state to the user of their contacts, and of their presence to their contacts. The rtc database is mostly static, holding publications of users, contact lists, privacy relationships etc whereas the rtcdyn database is dynamic, holding state about which front end in the pool holds a SIP endpoint and what active subscriptions are bound to the endpoint.
Because database mirroring is configured and managed at an individual database level, in the event of failure OCS/Lync cannot guarantee both databases are in the consistent state required to maintain the required quality user experience.
To summarise, when the failover process is executed, the following behaviour was observed:
- Initially, the Lync client signs in but is in “limited functionality mode”. This is because the Lync Front End service cannot communicate with the back-end database.
- Front End Service starts but events are logged advising that a back-end database connection cannot be made.
- It was concluded that due to inherent product design, the Lync Front End service will not connect to a SQL server that is not defined in the Lync Topology.
- It was only after we deleted the computer account for the failed principal SQL server from Active Directory and renamed the mirror SQL server did errors stop appearing in the Application Log and the Lync client function properly.
What about Group Chat?
In the next instalment, I’ll talk more about what happens to Group Chat when we failover the databases to the SQL mirror node. Make sure you subscribe to get updated!