Jeremy Davis
Jeremy Davis
Sitecore, C# and web development
Article printed from: https://blog.jermdavis.dev/posts/2024/event-memory-issues

Watch out for Events in your Sitecore code

They are an excellent way to break stuff if you don't understand them...

Published 02 December 2024

After my recent delve into memory optimisation for non-work code, I spent some time recently investigating a memory issue in a production Sitecore site. The outcome of that was an issue which can be a common problem for .Net code. So in the hope of seeing it less in the future, here are some notes on what I saw and how you can avoid the same trap...

The symptom url copied!

An easy one. The site was hosted in Azure PaaS, and every so often the CD server instances would recycle due to excess memory usage. The monitoring graphs for Working Set looked like:

Azure's monitoring graph for memory use in an App Service. The lines rise slowly to a peak of about 14GB before they are recycled. This pattern is repeated.

Broadly (with some variations for part-time instances, and deployments occurring) the memory usage of the instances is trending up over time. It starts around 2GB when an instance starts up, and goes up to around 12-14GB before being recycled.

Clearly, something isn't right here...

Gathering data url copied!

Usually the best way to diagnose issues like this is to get memory dumps. Azure makes this pleasingly easy. You can open up your App Service in the portal and select "Diagnostic Tools" from the menu of icons. And on this page there is an option for "Collect Memory Dump":

The Azure portal page for taking a memory dump. It asks for a storage account to record data into, whether to dump or dump and analyse and what instances to look at.

You need to select three things here. First is a storage account. The tool will create a blob store record for each dump you take, organised into folders by App Service, date/time and instance. So it needs you to provide a storage account you have write access to for this. Secondly you need to specify if you want collection and analysis, or just collection. I tend to pick "collect data only" as I want to analyse myself. Though the analysis will sometimes come up with things you would not have spotted. (so don't discount it entirely) And then finally you need to pick what instances to dump. In this case I picked one instance - the one which had most memory in use at that time.

When you click "Collect MemoryDump" it will work for a while and save the dump to your Storage.

For this test I took two dumps. One with the App Service instance consuming nearly its maximum memory usage. And then one after recycling that instance and waiting for it to handle a few requests and get back up to speed. That gave data for a good "before and after" comparison of what's going on in memory.

As an aside, it wasn't immediately obvious to me how you can recycle a single App Service instance. The main App Service page in the portal lets you recycle the whole service - all its instances. But for Production that's not a good thing. But the diagnostic tools page has "Advanced Application Restart" as an option towards the end of its left hand menu. And that lets you pick a single instance to cycle.

Preparing to analyse url copied!

Once you've got your data you need to download it locally to investigate. I'd recommend renaming the files to be clearly "before" and "after" at this point so you don't get confused by the cryptic names Azure gives them.

Then you can load these dumps in Visual Studio. To do a comparison between the two dumps take, the process is:

  • Open the solution for the source code of your site
  • Use "File / Open / File" from the main menu, and select your "before" dump file
  • That will open the "Minidump Summary Page". From the right hand menu there, pick "Debug Managed Memory": The dump summary page, showing the 'debug managed memory' option
  • Click that, and wait for VS to process the dump. For large files (14GB!) this can take a while, depending on the spec of your computer.
  • And once that's loaded, the "Managed Memory" window which appears will have a "With Baseline" dropdown in the top right. Click that, pick "Browse and select your "after" dump file: The memory dump window showing the option to select a baseline dump file This may cause a warning to be displayed: The 'Baseline represents a different instance' warning This is due to the process having recycled, and you can click "yes". That will now take further time to load the second dump, and convert the numbers in the various displays to be the delta between the dumps.

But now you have the relevant data for a diagnosis.

Looking for the issue url copied!

My usual go-to starting point with these memory analysis jobs is to sort the list of "stuff in memory" that Visual Studio shows you by the counts. (Which is the delta between the two dumps here - so just what changed between them) And in this case, that showed:

The analysis of what's in memory from the dumps, showing millions of objects in memory with the project's namespace

Yup. We've got lots of stuff in memory here, and most of it has the namespace of the website project. That's not a surprise, given were looking for a memory leak, though. Big numbers and your namespace are usually a good sign you've got a problem. So selecting the EventFacade object from the list there, what do we see in the reference graph?

The reference graph for the EventFacade object, showing its connection to the Sitecore EventSubscribers static property

From that panel of the memory diagnostics it's clear that lots of these EventFacade object are connected back to a single Sitecore object - Sitecore.Events.Event and its EventSubscribers type. (The screengrab clips it, but the actual property exposing that type is DynamicSubscribers) And that gives us a really good reason for this whole issue:

The diagnostics above are showing this is static. And the decompiled source backs this up:

public static class Event
{
    ... snip ...

    //
    // Summary:
    //     The dynamic subscribers.
    private static readonly EventSubscribers DynamicSubscribers;

   ... snip ...
}

					

So why is this a problem?

If you have any object in memory which is referenced by a static property / object / collection then it's fixed in memory until either you manually break that reference, or the whole App Domain is unloaded. Static objects exist for the lifetime of the App Domain, and don't get garbage collected. And that means that none of the non-static objects they hold references to can get collected either, even if they're no longer needed.

So this is why you have to be very careful with static events in C# generally, and Sitecore specifically - they make it really easy to cause a memory leak.

If you write code which calls Sitecore.Events.Event.Subscribe() to register your object to receive events being broadcast by Sitecore at runtime, then you must make sure that you always call Sitecore.Events.Event.Unsubscribe() as well when your object is finished with. Or you'll end up with exactly this sort of memory leak.

This is also true of more standard (non-Sitecore) C#, where you have code which looks something like:

public class EventTest
{
    public delegate void YourEventHandler(object sender, EventArgs e);
    public static event YourEventHandler DoStuff;
}

					

and then write something like:

EventTest.DoStuff += (s, e) => { Console.WriteLine("Event fired!"); };

					

If your code doesn't explicitly remove that handler again, you'll get a leak. This maybe doesn't matter in a desktop app, where you probably only set up your event handling code once. But in a web app this can be a significant issue because things are often set up for every page request. And that tends to lead to the sort of memory graph at the top of this post.

Fixing it url copied!

In the code I was looking at, some further examination of the code showed that sometimes the events were unsubscribed, and sometimes they were not. So the fix is to make sure everything always unsubscribes.

In cases where you're looking at old WebForms code (if you're that unlucky...) Then the Page_Unload() handler is probably your place to do this. If you're working with MVC then all controllers are IDisposable so you can override Dispose(bool disposing) to make sure these things are called once the MVC framework is done with your controller.

In the code I was looking at, it seemed like some developers in the past knew this, but that bit of institutional memory got forgotten along the way, and when some newer code was added it didn't follow these patterns.

But it will now...

↑ Back to top