Author Archives: Justin Morris

UC Deployment Success Criteria Post Published on NextHop

Just a quick one to let you know that I had a post published on Microsoft’s TechNet NextHop blog today about UC deployment success criteria. You can check it out here.

It covers a lot of areas that are important to ensure your deployment goes well including securing business buy-in, managing the voice experience (device selection) and ensuring training and adoption materials are delivered.

I hope this allows you to understand a lot of the business challenges involved with deploying UC and helps your deployment to become a success.

– Justin

Achieving Lync Server 2010 Site Resiliency Without Breaking the Bank

27 Replies

The resiliency story in Lync today is great, but it is geared around providing voice resiliency only through the use of the Backup Registrar functionality and the deployment of prioritised DNS SRV records.

I see more and more organisations wanting to provide full resiliency for all Lync workloads (IM and Presence, Conferencing, the lot) as Lync becomes more of a business critical service. They want to ensure they can still provide IM and Presence in the event an entire site goes down. The supported and recommended Microsoft scenario to achieve this is the Metropolitan Site Resiliency Solution, which requires a significant amount of infrastructure to achieve full Lync resiliency in the event of site failure. It can be cost prohibitive, complex to deploy and manage and the reality is that very few organisations are setup to support this.

I’ve managed to come up with a model that I’ve architected for an organisation that cannot deploy full metro data centre resiliency (stretched VLAN, SAN replication etc) but still want some level of site resiliency for all Lync functionality.
In this post, I’ll cover how you can provide site resiliency capabilities for not just voice but also IM and Presence, Conferencing and all other Lync workloads.

Introduction

I just want to cover off a few things before I begin just to set the scene. In this post, I’ll cover specifically site level resiliency only and not data or server level resiliency. At a high level, you can achieve data level resiliency by deploying SQL clustering for your Lync databases and for server level resiliency, by deploying an Enterprise Edition Front End pool, Mediation Server pool, etc. I’m not going to go into too much detail about achieving resiliency for voice either, because this is well documented on TechNet.

This design will allow you to make Lync completely available again in a 30-60 minute timeframe once the DR activation plan is initiated. This design also assumes that you have a primary data centre and a disaster recovery site only. The DR site should not support any production users during normal operations.

Lastly, I’m writing this post from the perspective of it being a Greenfield environment where no existing Lync Server 2010 infrastructure exists.

Determining your Site Resiliency Requirements

Firstly, you need to work out what Lync functionality you provide to your business that must be back up and running in your DR site. Is it IM and Presence? Is it Group Chat as well? Archiving? This will have an impact on what infrastructure you deploy in your DR site. It’s worth asking yourself/the business these kinds of questions:

If you have an Archiving Server deployed in your production site, does it need to be online in your DR site before your users can start using Lync again (for compliance purposes)?
If so, you will need to have an Archiving Server in your DR site.
Do you need to provide remote user access/federation in a DR scenario?
If so, you will need to deploy an Edge Server in your DR site.
Do you need to provide access to Lync Web App and Lync Simple URLs in a DR scenario?
If so, you will need to have a Reverse Proxy deployed in your DR site.

Some businesses want everything back up and running, others are satisfied with just providing Lync to internal users when DR is activated. It depends on your requirements and how extensive your business continuity plans are.

Preparing the Disaster Recovery Site

The very first thing we need to do is prepare the DR site. Make sure it is defined in Topology Builder as another central site to keep the separation of infrastructure logical. You should define topology components in the DR site as per your DR requirements but at a minimum, one Lync Server 2010 Standard Edition Server needs to be deployed.

Now here’s where it gets interesting. We need to home the Lync CMS on this pool in DR. This is required so we can access the Lync Server Control Panel in the event the primary pool is down. During my testing, I found that if the CMS was hosted on the pool in the primary site, I obviously couldn’t make any topology changes or changes to CMS using LSCP, which we need to do to make Lync available to our users.

To deploy the CMS to the Standard Edition server in your DR site, log onto the Standard Edition server in the DR site and run the Prepare first Standard Edition server step from the Lync Server 2010 Deployment Wizard. Once this is complete, you can publish your topology. For more information on setting up the Standard Edition server, check out this Technet Library article.

What if my CMS is already deployed elsewhere?

If you are implementing this DR design into an existing Lync Server 2010 environment and your CMS is already deployed on the first Lync Front End pool (Standard or Enterprise Edition) you deployed, that’s ok. To move it to your DR server, just follow Tom Pacyk’s great post on his blog on how to move the Lync CMS.

Preparing the Disaster Recovery Site for Group Chat

In order to provide site resiliency for Group Chat, we need to have a standby SQL server and a Windows server ready to go in DR. A lot of the principles from my post on database mirroring and Group Chat will apply here, in that you shouldn’t actually have a Group Chat server setup in DR. Instead, you will need a standby server ready to go to have Group Chat installed on it in the event of site failure.

Deploying and Configuring the Primary Site

Once we have our DR site sorted out with the CMS deployed to a Standard Edition server in it, it’s time to deploy our Lync Server 2010 infrastructure into our primary site.

At this stage, you can deploy your production pool, associated servers (Edge, Mediation, Archiving, Monitoring, etc) and provision your users. This step goes ahead as per any other Lync deployment.

Backing up your data

Now that we have a fully functioning Lync deployment and a DR site in place, it’s time to take the necessary steps to ensure we can restore services in the DR site in the event of site failure.

Given that we have the CMS in the DR site, the only thing we really have to worry about here is is backing up the contact lists of your users daily using dbimpexp.exe from the production Lync pool. You should move these xml files across to your DR site for safe keeping and so they are easily accessible.

If you’re using Response Groups, you should back these up using the instructions are the bottom of this TechNet article.

If you need to provide site level resiliency for Group Chat, you will need to keep a regular (at least daily) backup of the GroupChat SQL database.

Activating your Lync Site Resiliency Plan

So we’ve setup all the infrastructure, we’re backing up the data and we’re prepared for the worst. When the time comes (say your WAN link to the primary data centre fails or it all goes up in flames), you’ll need to follow these steps to restore service for Lync in your DR site:

Firstly, open the Lync Server Control Panel (if your CMS is on the pool that failed, you won’t be able to open it, hence homing it on the DR pool).
Navigate to the Users tab and click Find to search for all users. Alternatively, you can filter the search to only find users registered against the pool that is in the site that failed.
Click the Action button, then select Move all users to pool.. from the drop down menu.
When the Move Users dialog appears, select the name of the pool that has failed under Source registrar pool.
Under the Destination registrar pool drop down menu, select the name of the Standard Edition server in your DR site.
Now click the check box next to Force (we have to click Force because the source pool is down) and then click Move.
By selecting Force, what this will do is that the task will not attempt to reach out to the source registrar pool and migrate the users’ contact list. Rather it will just update the msRTCSIP-HomeServer attribute for the users. As a result, all users will have lost all their contact lists (as the source pool is offline).
Using dbimpexp.exe, import the contact lists you have been backing up each night onto the DR pool where the users are now homed. See my previous post on how to do this.
If you’re using Response Groups, restore them using the procedure outlined in this TechNet article.
Lastly, update DNS records so that the sip.domain.com A record points to the IP address of your DR Standard Edition server, or update your SRV records to point to the DR Standard Edition server.

What about Group Chat?

I haven’t forgotten about our old mate GC. 🙂 I’ve covered in some context providing DR for Group Chat before, so a lot of the steps from that post can actually be applied to this scenario.

I was reluctant to double up on content in this post so have chosen to omit specific steps, but if you’d like me to detail providing site resiliency for Group Chat in a separate post, let me know in the comments.

Expected Behaviour

When your users sign in, they will have full instant messaging, presence and conferencing abilities (audio, video, IM, desktop sharing, etc). Scheduled conferences will take a hit (as these will be tied to the failed pool), but new ad-hoc conferences will be able to be created.

Provided you have the deployed DR voice infrastructure (Mediation Servers, gateways, backup routes etc), your users will be able to make/receive calls to the PSTN also.

Conclusion

I’ve tested this scenario in a lab environment and can verify that it does provide a full restoration of Lync features. It’s not an automatic failover, but it does provide a level of site resiliency that allows you to get Lync back up and running for your users in a second site without shelling out for the infrastructure required the Metropolitan Site Resiliency design.

Give this a go in your lab, let me know how it goes for you. If you’ve got some questions, want me to go into more detail in a particular area, let me know in the comments. If you decide to go ahead with it, I hope it helps you provide a better level of service to your organisation.

Interview with a UC Pro Series on NextHop – Elan Shudnow

Leave a reply

In my monthly feature on Microsoft’s TechNet NextHop blog, I interview one of the community’s long standing legends, Elan Shudnow.

Elan’s blog has helped me understand OCS over the years and make sense of a lot of different components including Group Chat. I sat down with him and found out how he got into Lync, what kind of work he does and what he gets up to in his spare time.

Check out the interview here on NextHop. If you’ve got a suggestion of who I should interview next, drop me a line in the comments.

Lync for Mac 2011 14.0.2 Update Available

Leave a reply

A bit late to the party I know. 🙂 I was in Turkey all last week so didn’t get a chance to write about this earlier.

Last Monday, Microsoft released a new update (14.0.2) to Lync for Mac 2011 that fixed a few issues, one which I’ve been keeping an eye on for a while and that many on the TechNet forums reported – problems with the Cisco AnyConnect VPN client.

Secondly, this update fixes scenarios where the client might unexpectedly sign out or crash. Sensational. I’ve definitely experienced this first hand so this is a welcome update to improve stability.

To update your client just open Office Updates and it’ll do it for you. Easiest way is from within Lync, click Help then Check for Updates. Alternatively, you can download the patch manually.

For more information and a download link, check out the Microsoft Support KB article here.

MUCUG London April 2012 Review

Leave a reply

Last Thursday night Adam, Tom and I got together once again to put on MUCUGL here in London. This time we teamed up with AcmePacket to stage an event in the intimate Eight Club, deep under the City of London in a cosy basement venue. The treacherous rainy afternoon that would have made Noah nervous stopped no one, and we all settled in for an evening talking about call flows in Lync.

First up, Adam presented a great deck on Fixed-Mobile Convergence (FMC) that featured a real life case study and how it can be the holy grail. Following that, I presented on SIP call flows in Lync and talked about what SIP, SDP, RTP and ICE is along with many other protocols.

You can view my slides below but you won’t see anything on slides 12 through 16 because this is where I jumped over to Snooper to show some SIP messages in each call scenario which I can’t share publicly unfortunately.

Microsoft Lync 2010 Call Flows Explained

View more PowerPoint from MUCUGL

After the break, Geraint from AcmePacket talked about their SBC products and Tom finished up the evening talking about the latest developments in the worldwide Lync community. Afterwards over a beer, we all shared some war stories about deployments and recent challenges with Lync. All in all a great event. You can see Adam’s roundup on the event on the MUCUGL blog here.

Our next MUCUGL event is on July 19th, so save the date in your calendars now.

Improving Presence Privacy in Lync 2010

6 Replies

An update was recently released that addresses a few niche privacy scenarios in Lync 2010. Even though presence is really what makes Lync amazing and enables so many innovative ways to communicate, this additional insight can cross the line and impact work/life balance. To address this, Microsoft has released a patch to remove the LastActive attribute from the presence aggregation category.

This update will be handy for organisations that are approaching presence functionality with caution or have strict rules around how much information a user makes available to the rest of the organisation. Particularly, it will be useful for companies that operate in countries where it’s illegal to monitor employee’s activity.

A bit more from the Microsoft description of this update:

The Lync Server 2010 presence scheme includes a method for calculating and displaying how long a user has been away or offline. This is known as “Last Active.” The LastActive attribute returns presence inquiries only from users who have the “Colleagues,” “Workgroup,” and “Friends and Family” privacy relationships.

Lync Server 2010 always calculates the Last Active time stamp during presence aggregation for each user and stores it in a database. Last Active presence information is retrieved to satisfy individual presence inquiries and subscriptions by other users. This occurs according to the access level that the users were assigned. The design of the Last Active presence information has changed for the following reasons:

Last Active presence information may be incorrectly interpreted as reflecting the actual user’s status at work. Therefore, users may rely on it to remotely monitor an employee’s activity. However, this behavior is not allowed in some countries.

Last Active presence information that is provided to users should not disclose how long a user is away or offline for business reasons.

Note The aggregate presence data that is calculated by Lync Server 2010 contains the status history of each user’s endpoints if a user is signed to multiple endpoints. The status of each endpoint changes automatically in response to system events (such as logon, logoff, workstation lock and unlock, and network connectivity events), configured timeouts, scheduled meetings, and user activity. The status of each endpoint can also be changed manually by the user. In this situation, Last Active presence information cannot be considered as an accurate measure of the user’s presence. Instead, Last Active presence information is intended to provide additional information about the user’s availability and willingness to communicate.

So as you can see, this patch has basically been released for companies that may have gone a bit too far with presence and how they are tracking their staff. If you don’t keep a handle on this, you could find a rift created between managers and their employees. Imagine this scenario:

Manager: “I saw that your presence in Lync was set to away for 7.5 minutes at 11:35am, what were you doing then?!”
Employee: “Uhh, I was in the bathroom?!”

Really what you want to do is strike a balance between personal accountability and trust when using presence. Don’t let your managers go all “big brother” on your staff to the point where they are nervous to walk away from their computer.

You can check out more about the update here on the Microsoft Support site.

Mitel LBG does not support DNS load balancing with Lync Server 2010

5 Replies

I discovered this last week when trying to get RCC to work between a Lync Server 2010 Enterprise Edition Front End pool and a Mitel Live Business Gateway (LBG). The environment consisted of a greenfield Lync deployment with a Lync Front End Enterprise Edition pool consisting of two Front End servers that utilises DNS load balancing.

Problem

A few problems were initially observed here in the Lync client when we attempted to make a call using RCC:

Lync places the call and the Mitel handset goes off-hook and dials. However once the call is established, Lync session window does not reflect call duration and does not set presence to “In a Call”.
If you hang up the call by setting the earpiece down on the Mitel handset, the Lync session window doesn’t close like it should. After a few seconds it generate an error and closes.
If you try to end the call using the Lync session window, nothing happens and the call does not end.

Cause

So basically what is happening here is one way SIP traffic. Let me illustrate and explain what is causing this problem:

Lync client sends a SIP INFO message that contains the CSTA command MakeCall to the Front End pool (which has a FQDN of pool01.contoso.com).
One Front End server (in this case, FE1) in the pool passes this to the Mitel LBG to take the handset off-hook and dial the number.
The LBG tells the 3300 ICP to dial the number.
The LBG dynamically looks up the pool FQDN of pool01.contoso.com that sent it the SIP traffic and because it can’t cache DNS records like the Lync client can, may receive the IP address of FE1 or FE2 to send return traffic to. If it sends traffic to FE2, this is where the problem begins.

When we run a trace on the Lync Front End Pool, we see the following SIP error message in response to the SIP INFO message the LBG sent FE2:
“ms-diagnostics: 1037;reason=”Previous hop client did not report diagnostic information”;Domain=”contoso.com”;PeerServer=”10.0.10.10″;source=”FQDN of FE2″

Basically this is FE2 saying “hey LBG, you sent me traffic I didn’t generate, I don’t know what you’re talking about” and FE2 drops the return SIP messages. As a result, the Lync client never sees any return SIP messages to change presence to In a Call, show call duration, etc.

Solution

I found that there are a few ways around this problem that are scalable. Initially I created a manual entry in the local hosts file of the LBG for the FQDN of the pool to resolve to the IP address of one Front End server only, but once users registered on the second FE and started sending traffic to the LBG, this workaround no longer worked.

Use a load balancer

The only real solution to this is to load balance traffic on port 5060 (or whatever port you are using to communicate with the LBG with) to the Front End servers in the pool. This will mean the LBG only has one IP address to send its traffic to and the load balancer will take care of session stickiness.

Conclusion

Up until now, all my RCC deployments were either with hardware load balanced Front End pools or with Standard Edition servers so I’d never encountered this before. From this discovery I think we can safely conclude that other vendor CSTA gateways like Cisco Unified Presence Server (CUPS), Avaya’s AES, Nortel CS1000 NRS, Genesys GETS, etc that provide RCC functionality for Lync 2010 also do not support DNS load balancing.

So if you’re looking at deploying a Lync Server 2010 Front End Pool and you want to hook it up to your PBX for Remote Call Control, you will need a hardware load balancer of some kind. The Lync supported vendors/models are listed here.

Ok, I think I know too much about RCC now.. 🙂

Interview with a UC Pro Series on NextHop – Ståle Hansen

1 Reply

It’s that time of the month again, when I interview one of the worldwide Microsoft Lync community’s shining stars on the Microsoft TechNet NextHop blog. This month I’ve interviewed Norwegian Lync MVP (re-awarded yesterday on the 1st April, congrats!) Ståle Hansen and found out what he loves about UC, what he does in his job and what makes his home town cool. This is the third in the series and I’m really stoked with the momentum and feedback it’s receiving.

Check out the interview here on NextHop. Also make sure you check out previous interviews of folks like Tom Arbuthnot, Jeff Schertz and Tom Laciano. If you’ve got a suggestion of who I should interview next, drop me a line in the comments.

SQL Database Mirroring with Lync Server 2010 Series – Group Chat

1 Reply

In previous posts in this series on Lync Server 2010 and SQL Mirroring I’ve covered the prerequisites, how to failover your SQL databases and how the backend Lync databases behave once you’ve failed over. In this final post, I’m going to cover how Group Chat behaves when it’s database is mirrored and subsequently failed over.

As with my other posts regarding SQL mirroring and Lync Server 2010, I must stress that this is completely unsupported by Microsoft. I attempted this deployment scenario purely as a “could it be done” exercise only. Do not take this as gospel and do not deploy it in a production environment.

Preparing the DR scenario

So the first thing we need to do is get everything setup to support Group Chat in the DR site. This involves mirroring the Group Chat database between the Principal and Mirror SQL server nodes in each site and ensuring the SQL mirror node is setup to allow a Group Chat server to connect to it.

Mirroring the Group Chat database and preparing the SQL Mirror node

I covered this in the prerequisites post, so we can assume that we’ve followed the steps to mirror the Group Chat database already.

But before we can failover the Group Chat database, we need to make sure the mirror node is setup so the Group Chat Server can connect to it. To do this, we need to make sure security is setup on the SQL server mirror node to support Group Chat. TechNet covers the steps required to do this throughly in this article, so all your need to do is follow the steps listed on your SQL mirror node.

Setting up a standby Group Chat server

The first caveat to throw in the mix here is that to bring up Group Chat in the DR site, you mustn’t actually have the Group Chat server setup already. This is because Group Chat topologies come in either single or multiple server flavours, so if it were already part of the topology it would be used by users. In this scenario, we want to keep this server as a standby and not used when the production site is online.

The server you’ll be using for Group Chat will need to be a cold standby machine in DR that already has a certificate issued to it for its FQDN and the Group Chat prerequisites installed. Once this is setup, it’s ready to have Group Chat installed on it when you need to activate your DR plan.

Activating Group Chat in the DR site

So once you’ve failed over your database and made the SQL mirror node the Principal, it’s time to get Group Chat up and running again in your DR site. To do this, you essentially need to remove the Group Chat server you had in the primary site because a 1:1 to relationship exists between a Group Chat server/pool and a Lync Server 2010 registrar (Standard Edition server or Enterprise Edition pool). Additionally, the server we’ll have prepared in the DR site will have a different server name.

Cleaning the Lync Topology

Firstly we need to remove the old Group Chat server from the Lync topology. This involves running the following cmdlets:

Remove-CsTrustedApplicationPool -identity <FQDN of failed Group Chat server>
Remove-CsTrustedApplication -identity <FQDN of failed Group Chat server>

These cmdlets will remove references to the failed Group Chat server from your Lync topology, allowing you to spin up a new Group Chat server in the DR site. If we leave the old Group Chat server in the topology, when we try to add a new server in DR it picks up this configuration automatically and we can’t define a new next hop pool.

Making a hosts file change

Next we need to make a local hosts file change so that the FQDN of the principal SQL server resolves to the IP address of the mirror SQL server in the DR site.

We need to do this because although the Group Chat services are somewhat SQL mirroring aware (I witnessed a connection from my Group Chat server to my SQL mirror node on TCP port 1433 after I failed over), the good news doesn’t last and eventually the Lookup and Channel services stop functioning. Modifying the hosts file overcomes this and the services stay started.

Install Group Chat in the DR site

Now that we’ve cleared references from the topology and our hosts file is modified, we can begin the install of Group Chat in the DR site by installing Group Chat using the Installation Wizard as normal:

When prompted to provide a SQL server and instance name, provide the principal SQL server name and the Group Chat database name. Once we commit this configuration, it will write new rows to the tbl.Config table in the Group Chat database to allow this Group Chat Lookup/Channel server to work.
When prompted at the next screen, assign the previously imported certificate to the new Group Chat Server.
Following this screen you will see the next hop address and site. These will all be greyed out because the tbl.Config database has already been populated with them. We’ll fix this in step 5.
Once deployment has completed, ensure all Group Chat services start. If some services fail to start, consult the local Lync Server event log and troubleshoot SQL database access.
Next, using the Server Config Tool, set the next hop Lync pool that Group Chat is configured with to the FQDN of the DR Lync Standard Edition server or Enterprise Edition pool. This will modify the tbl.Config table for us in the SQL database.

Once you’ve got Group Chat installed, you will want to run Get-CsTrustedApplicationPool to ensure the Lync topology has been updated with the new Group Chat server details so your FE pool in DR can talk to it.

Validation and Summary

Once you’ve followed the steps above, you’ll want to fire up your Group Chat client and attempt to sign in. If you can’t sign in straight away, check that your Group Chat services are started ok and that a Trusted Application pool exists in Lync for your DR Group Chat server.

This post (and the series) has really been a “proof of concept” exercise only, but it does show that Group Chat can utilise a mirrored SQL database, with some pretty decent configuration change. As I said, don’t use this blog post as verification when making a design decision around your DR requirements for Lync.

Thanks for following along, I hope this series has helped you understand the challenges around SQL mirroring in Lync in more detail and why Microsoft doesn’t support it.

Lync Server 2010 will not be supported on SQL Server 2012

Leave a reply

News just in from the frontline regarding SQL Server 2012 and Lync Server 2010.

A blog post has been published by Damien Caro (an Microsoft IT Pro evangelist based in Paris) on his TechNet blog dispelling rumours and uncertainty around whether Lync Server 2010 will work with the newly RTM’d SQL Server 2012 (previously code named Denali).

Damien writes:

“There are some excellent reasons for willing to use SQL 2012 with Microsoft Lync like the support of the new availability model (Always On). However, Lync 2010 is using a feature called DMO (Distributed Management Objects) that was introduced in SQL 7.0 (a long time ago !).

SQL 2012 does not support this feature anymore as it is indicated in this article : http://msdn.microsoft.com/en-us/library/ms131540.aspx so SQL 2012 will not be a supported platform for Lync 2010 as it is now.”

You can read the full post over here on his blog and hear it from the horse’s mouth.

Lync databases can be deployed on instances from SQL Server 2005 SP3 up to SQL Server 2008 R2 today. This new information means you’ll need to seriously think carefully about your new SQL environments and new/existing deployments of Lync Server 2010.

Credit to fellow Modality consultant Tom Arbuthnot for finding this one.