This is post 2 of 2 in a series titled Diagnosing memory leaks around Sitecore events
- Watch out for Events in your Sitecore code
- Reference counting for debugging resource leaks
I wrote about diagnosing some issues with static event bindings in a previous post, and talked about finding these issues of "subscribing more than you unsubscribe" with memory dumps. But finding this sort of a problem before it becomes an issue is better if you can, so what techniques might we use for that?
Well one answer is to steal a trick that Windows uses in a variety of places: reference counting.
This is a pattern used a lot in operating systems and lower level software. The theory is very simple: Every time you allocate "a thing" you record that, and when you de-allocate it you remove that record. So at any point in the execution you can see how many outstanding references there are.
You can find examples of this all over the internals of Windows - things like "is it safe to deallocate this COM object" rely on counting how many times someone's taken a reference to it, and how many of those have been handed back. Once that number is zero it's safe to deallocate if necessary.
So here's how a simple interpretation of this can help you find issues like the Sitecore events issue problem from before:
We need to find the right place in the code to inject some monitoring. While it's theoretically possible to put this code "wherever the resource in question is used" this isn't a great plan if you can avoid it. Any code you're having to copy into multiple places in your project is code that you may forget to copy into a critical place... (or forget to remove again afterwards)
So it's much better to add your debug code in a single place that's called by all the things that need the resource in question.
In the code I'd been looking at before, the references to Sitecore's event manager object were wrapped up in a class already - so that was the obvious place to put some logic. That object had this interface:
public interface IEventWrapper { public void SubscribeToEvent(string eventName, EventHandler handler); public void UnSubscribeFromEvent(string eventName, EventHandler handler); public void RaiseEvent(string eventName, params object[] parameters) }
And there was a concrete implementation of this which was referenced by all the other code in the system using DI. So the
SubscribeToEvent
and
UnSubscribeFromEvent
methods here seem like the right places to inject some debug logic.
There are a couple of ways you might go about this. If you're feeling fancy and have DI in place that can support it, then using the Decorator Patten here could be a good approach. You write a new object that inherits the same interface as your original code, and wraps up the basic object. The original object is passed in as a constructor parameter, and the new object runs its own code and calls the same method in the original object.
So for the code I was working with, that might look like:
public class ReferenceCountedEventWrapper : IEventWrapper { private IEventWrapper _base; public ReferenceCountedEventWrapper(IEventWrapper base) { _base = base; } public void SubscribeToEvent(string eventName, EventHandler handler) { // do the reference counting _base.SubscribeToEvent(eventName, handler); } public void UnSubscribeFromEvent(string eventName, EventHandler handler) { _base.UnSubscribeFromEvent(eventName, handler); // do the reference counting } public void RaiseEvent(string eventName, params object[] parameters) { _base.RaiseEvent(eventName, parameters); } }
And then we could configure the system's DI so that when it's asked for an
IEventWrapper
it returns a
ReferenceCountedEventWrapper
which has been passed the original implementation of
IEventWrapper
when it was created.
But if you can't go down that road (or you want to do some of the fancier logging I'll mention later) then you can just modify the original implementation of your equivalent of
IEventWrapper
directly.
So what tracking do we want here?
I wanted the code to keep a count of allocated and deallocated Sitecore events so it's easy to check if things balance out. And secondly it wants some debug logs so that we can see what code is creating (and by implication not destroying) these resources without having to trawl through the codebase manually.
Keeping the count is fairly simple:
private static int _count = 0;
We can keep a counter in a static field, so that we have a single count across any instances of our resource class that might get instantiated.
And the code can increment and decrement that before or after it does anything with the object we want to track:
public void SubscribeToEvent(string eventName, EventHandler handler) { _count += 1; // do the subscribing } public void UnSubscribeFromEvent(string eventName, EventHandler handler) { // do the unsubscribing _count -= 1; }
So with that in place you could request a page from your site and know if all the objects involved in that whole process leaked any of the resources we're counting here.
But it's not easy to see how or why right now...
For that we need to add some debug output. For Sitecore code, probably the easiest thing here is just to drop that into the main log files. But you could use other patterns here if you preferred.
At the most basic level it's helpful to output the state of our counter each time a resource gets allocate or deallocated:
public void SubscribeToEvent(string eventName, EventHandler handler) { _count += 1; Log.Info($"SUBSCRIBING EVENT {eventName} [count now {_count}]", this); // do the subscribing } public void UnSubscribeFromEvent(string eventName, EventHandler handler) { // do the unsubscribing _count -= 1; Log.Info($"UNSUBSCRIBING EVENT {eventName} [count now {_count}]", this); }
So now you can request a page, and look at the log file to see which events are being allocated and deallocated. And if at the end of the request that count value has not returned to zero then you know something is wrong...
But you might find a few more bits of information valuable here.
Sometimes knowing the imediate caller of your code can be helpful. C# allows you to add some attributes to method parameters which can be populated with class and file information. This might be more of a challenge if you go down the DI-injection route for the debug code as it involves changing the interface being injected, but you can modify a method to track callers:
public void SubscribeToEvent(string eventName, EventHandler handler, [CallerFilePath] string callerFilePath = null, [CallerMemberName] string callerMemberName = null) { // write log data with the callerMemberName and callerFilePath included }
This is the lowest impact way of finding out how execution got to you, but it doesn't always give the full picture. What you see is the direct caller of your method. In complex code you may need to know more about the call tree, however, to understand what happened. In that case you probably want to look at a stack trace. This gets you all of the current execution state data, but it's somewhat more of a performance impact - you don't want to do this in production code.
But you can write:
var stack = new StackTrace(); var trace = stack.ToString();
and the
trace
variable will give you an exception-like stack trace showing exacly how the code ended up where it is now. Though this can be pretty long (it'll go all the way from this code back to the bits of IIS / ASP.Net that handled the incoming request) so you might want to consider trimming it a bit before recording it in a log for readability.
So now you should have some log data which can show how often the resource in question got allocated, and how that occurred.
You can browse to a page and look at the log data to see if the count is at zero once the page has rendered. If it is, great - no issues here. But if not you need to look at why. You might see something like:
SUBSCRIBING EVENT basket [count now 1] SUBSCRIBING EVENT login [count now 2] SUBSCRIBING EVENT logout [count now 3] UNSUBSCRIBING EVENT logout [count now 2] UNSUBSCRIBING EVENT basket [count now 1]
If you compare your "allocate" and "deallocate" log messages, you should notice that there's a "deallocate" message missing in this case. You can pair up the messages in the log where you get both of (de)allocation and remove them. Then the allocates left are the ones where the code fails to release the resource again - and the source of your leak. So in the example above, it's whatever allocates the "login" event and fails to deallocate it that's the source of the issue.
And that gives you some crititcal info about where you need to look for your problem. Why doesn't the class that allocates the "login" event deallocate it again? Is that code missing? Is it in a destructor that's not called? Is there a bug in the logic? Now you know where to look to try and work this out.
Now that is the simplest possible approach here - and this example is really aimed at an individual developer testing locally. Things get somewhat trickier if the code is going to be under meaninful load. You have to record your data in a way that allows unpicking which messages came from which requests. And that often means "be able to work out which thread any log message came from". With the mention of threading, you also have to be careful that your counting and logging code can't get into trouble due to being called repeatedly from different request threads.
So these ideas are easiest to use as a technique for developer test. The memory dumps I wrote about previously are likely a better way of looking at systems under load.
↑ Back to top