After my recent delve into memory optimisation for non-work code, I spent some time recently investigating a memory issue in a production Sitecore site. The outcome of that was an issue which can be a common problem for .Net code. So in the hope of seeing it less in the future, here are some notes on what I saw and how you can avoid the same trap...
An easy one. The site was hosted in Azure PaaS, and every so often the CD server instances would recycle due to excess memory usage. The monitoring graphs for Working Set looked like:
Broadly (with some variations for part-time instances, and deployments occurring) the memory usage of the instances is trending up over time. It starts around 2GB when an instance starts up, and goes up to around 12-14GB before being recycled.
Clearly, something isn't right here...
Usually the best way to diagnose issues like this is to get memory dumps. Azure makes this pleasingly easy. You can open up your App Service in the portal and select "Diagnostic Tools" from the menu of icons. And on this page there is an option for "Collect Memory Dump":
You need to select three things here. First is a storage account. The tool will create a blob store record for each dump you take, organised into folders by App Service, date/time and instance. So it needs you to provide a storage account you have write access to for this. Secondly you need to specify if you want collection and analysis, or just collection. I tend to pick "collect data only" as I want to analyse myself. Though the analysis will sometimes come up with things you would not have spotted. (so don't discount it entirely) And then finally you need to pick what instances to dump. In this case I picked one instance - the one which had most memory in use at that time.
When you click "Collect MemoryDump" it will work for a while and save the dump to your Storage.
For this test I took two dumps. One with the App Service instance consuming nearly its maximum memory usage. And then one after recycling that instance and waiting for it to handle a few requests and get back up to speed. That gave data for a good "before and after" comparison of what's going on in memory.
As an aside, it wasn't immediately obvious to me how you can recycle a single App Service instance. The main App Service page in the portal lets you recycle the whole service - all its instances. But for Production that's not a good thing. But the diagnostic tools page has "Advanced Application Restart" as an option towards the end of its left hand menu. And that lets you pick a single instance to cycle.
Once you've got your data you need to download it locally to investigate. I'd recommend renaming the files to be clearly "before" and "after" at this point so you don't get confused by the cryptic names Azure gives them.
Then you can load these dumps in Visual Studio. To do a comparison between the two dumps take, the process is:
But now you have the relevant data for a diagnosis.
My usual go-to starting point with these memory analysis jobs is to sort the list of "stuff in memory" that Visual Studio shows you by the counts. (Which is the delta between the two dumps here - so just what changed between them) And in this case, that showed:
Yup. We've got lots of stuff in memory here, and most of it has the namespace of the website project. That's not a surprise, given were looking for a memory leak, though. Big numbers and your namespace are usually a good sign you've got a problem. So selecting the
EventFacade
object from the list there, what do we see in the reference graph?
From that panel of the memory diagnostics it's clear that lots of these
EventFacade
object are connected back to a single Sitecore object -
Sitecore.Events.Event
and its
EventSubscribers
type. (The screengrab clips it, but the actual property exposing that type is
DynamicSubscribers
) And that gives us a really good reason for this whole issue:
The diagnostics above are showing this is static. And the decompiled source backs this up:
public static class Event { ... snip ... // // Summary: // The dynamic subscribers. private static readonly EventSubscribers DynamicSubscribers; ... snip ... }
So why is this a problem?
If you have any object in memory which is referenced by a static property / object / collection then it's fixed in memory until either you manually break that reference, or the whole App Domain is unloaded. Static objects exist for the lifetime of the App Domain, and don't get garbage collected. And that means that none of the non-static objects they hold references to can get collected either, even if they're no longer needed.
So this is why you have to be very careful with static events in C# generally, and Sitecore specifically - they make it really easy to cause a memory leak.
If you write code which calls
Sitecore.Events.Event.Subscribe()
to register your object to receive events being broadcast by Sitecore at runtime, then you must make sure that you always call
Sitecore.Events.Event.Unsubscribe()
as well when your object is finished with. Or you'll end up with exactly this sort of memory leak.
This is also true of more standard (non-Sitecore) C#, where you have code which looks something like:
public class EventTest { public delegate void YourEventHandler(object sender, EventArgs e); public static event YourEventHandler DoStuff; }
and then write something like:
EventTest.DoStuff += (s, e) => { Console.WriteLine("Event fired!"); };
If your code doesn't explicitly remove that handler again, you'll get a leak. This maybe doesn't matter in a desktop app, where you probably only set up your event handling code once. But in a web app this can be a significant issue because things are often set up for every page request. And that tends to lead to the sort of memory graph at the top of this post.
In the code I was looking at, some further examination of the code showed that sometimes the events were unsubscribed, and sometimes they were not. So the fix is to make sure everything always unsubscribes.
In cases where you're looking at old WebForms code (if you're that unlucky...) Then the
Page_Unload()
handler is probably your place to do this. If you're working with MVC then all controllers are
IDisposable
so you can override
Dispose(bool disposing)
to make sure these things are called once the MVC framework is done with your controller.
In the code I was looking at, it seemed like some developers in the past knew this, but that bit of institutional memory got forgotten along the way, and when some newer code was added it didn't follow these patterns.
But it will now...
↑ Back to top