Jeremy Davis
Jeremy Davis
Sitecore, C# and web development
Jeremy Davis
Jeremy Davis
Sitecore, C# and web development

Is that really a Unicorn issue?

Sometimes the smoking gun isn't the reason...

Published 09 May 2022

I hit a rather confusing issue with a release a while back, which initially appeared to be a Unicorn problem. But after investigating the details, I think this was actually an infrastructure problem causing some odd behaviour. I doubt this is a common problem, but still worth writing down in case it's a challenge for anyone else...

Some background

The release in question was a containerised build. It had been running fine for some time, and I'd already released this build to the "dev" and "qa" instances of the site. The release process was set up to deploy the new images to Kubernetes, and then once the CM image was deployed and started, trigger the Unicorn sync to complete the deployment.

The issue

When the release was triggered on the pre-prod instance of the site. The images deployed ok, but then the Unicorn sync failed. I've had instances in the past where synchronisation had failed for data-related reasons - a missing file in source control, or something an editor had done to the content tree. But in this instance it failed with a much stranger error:

Creating master:/sitecore/System/Modules/some-item failed. API returned null.

					

Google wasn't a great deal of help here. Searching for references to that message gave one hit - and that was in the GitHub repository that the code in question came from. So no easy answers for this one.

I tried repeating the deployment - which gave the same error. And I tried pulling down the source and running a sync on my local instance - which gave no errors.

So, for a while I was a bit stumped...

A possible explanation

Going back to first-principles I thought about two things:

Firstly, what is the code doing when the exception is thrown? Well looking at the source, it's trying to create an item from a template:

AssertTemplate(database, new ID(serializedItemData.TemplateId), serializedItemData.Path);

Item targetItem = ItemManager.AddFromTemplate(serializedItemData.Name, new ID(serializedItemData.TemplateId), destinationParentItem, new ID(serializedItemData.Id));

if (targetItem == null)
    throw new DeserializationException("Creating " + serializedItemData.DatabaseName + ":" + serializedItemData.Path + " failed. API returned null.");

targetItem.Versions.RemoveAll(true);

					

The call to AddFromTemplate() is returning null - which leads to the exception being thrown. Why might that call fail by returning null? It's not obvious, but likely it's to do with some unexpected data somewhere?

The second difference is that the infrastructure is different between pre-prod and dev/qa. The client's infrastructure choices for this instance mean that there are two CM servers in the pre-prod deployment.

Thinking about that made me wonder if the cause here might be related to those multiple CM instances. What if the deployment was starting with a connection to one CM instance, and then getting swapped to the other one as the containers updated? And that lead me to thinking about clearing caches - because that could be a reason the in-memory data might be different.

And in fact clearing the cache on the active CM instance did get rid of the error, and allow the sync to complete correctly. And none of the other things I'd tried had done so. So I'm fairly sure something data and cache related is what was happening when I saw the error.

Preventing this problem

So, turns out this is probably an interesting variation on something I've discussed before. The problem comes down to how it waits for container updates. If you have multiple CM roles and a release process which performs post-release tasks on your Sitecore CM role, you must wait for all of these roles to update before continuing. (And in fact, this issue came up before I posted that previous problem - it's not recurred since I made the changes I wrote about there)

And once again I'm thinking life is simpler with only one CM instance running at a time...