Jeremy Davis
Jeremy Davis
Sitecore, C# and web development
Article printed from: https://blog.jermdavis.dev/posts/2019/bitten-by-alwayson-replication

Bitten by AlwaysOn replication

Published 23 December 2019
Sitecore SQL Server ~2 min. read

I am no DBA. In fact I'm happy to admit that I know just enough SQL to be dangerous. So when database problems come up, they can be tricky. I recently helped a client work through an issue with analytics databases, which wasn't easy to google – so it's time to help future developers find it...

The issue

A while back I'd been invoved in rolling out some infrastructure for a Sitecore v9.1 instance. At the point the deployment was done the AlwaysOn configuration for SQL Server wasn't in place – but it was going to be fitted in afterwards. Initially, everything was working ok – but after the client did work to get the AlwaysOn configuration enabled, issues started to occur. Lots of stuff to do with Analytics stopped working. Working through the issues, the key one was that analytics data stopped being recorded, because the xConnect collection role started recording errors:

2019-11-25 03:07:05.630 +01:00 [Error] ["XdbContextLoggingPlugin"] XdbContext Batch Execution Exception
Sitecore.XConnect.Operations.DependencyFailedException: One or more dependencies failed --->
Sitecore.Xdb.Collection.Failures.DataProviderException: Store Error:
Cannot open database "ClientSite_Xdb.Collection.ShardMapManager" requested by the login.
The login failed.

					

The connectionstrings for this user appeared correct. You could log on to the servers via SQL Management Studio with the account, despite the error. But, oddly, on one of the two servers you could not browse the three databases for analytics that collectionuser is supposed to access. However when you looked at those databases, they did have collectionuser listed in their security details:

collectionuser rights

Cue head scratching and googling...

An explanation

There were two parts to this issue. The first was understanding how AlwaysOn's configuration was synchronising things. It happily syncronises databases (and any contained users they have) but it was not synchronising server-level things, like the collectionuser account itself. The xConnect shard databases aren't accessed by contained user accounts, because Sitecore needs the same user for all three – hence collectionuser is a server-level account instead.

And the second thing was that internally, SQL Server seems to wire up server-level logins to the rights on databases using "Security IDs" (SIDs) rather than usernames.

So what had happened was this:

There had been an issue with the process of setting up AlwaysOn. To try and fix it, collectionuser account had been deleted and recreated on both of the AlwaysOn machines. The rights for it had been put back on the primary machine, and a synchronisation performed to push them over to the second machine.

But under the surface there were now actually two different collectionuser accounts. On server #1 we had collectionuser with SID #1″ and on sever #2 we had collectionuser with SID #2″.

When you look at that via SQL Management Studio, initially everything looks ok. For your collection shard databases on both machines you see they have collectionuser attached, with apparently sensible rights. When you look at the server-level login, on the primary machine you see a user who has rights to the shard databases. But on the secondary you see a user who has no rights to anything - because of that difference in SIDs hiding under the surface.

The resolution

The trick to making this work is that you need to have both of your machines have a user with the same SID. A pile of googling lead to a blog post which describes how you should approach to that: You can create a user with a known SID, using:

CREATE LOGIN collectionuser WITH
password = 'password@123', SID = 0x59B662112A43D24585BFE2BF80D9BE19

					

So you can either make up your own SID and create the user on both machines using it, or you can create it normally on one machine, and then read the SID it used back via:

SELECT name, sid FROM sys.sysusers WHERE name = 'collectionuser'

					

and use that value.

And once both machines have the same SID for the user, the "can't log in" error will stop happening.

↑ Back to top