SharePoint in a nutshell - Debugging Out of memory exception

Recently, one of my customers SharePoint WFEs in production FARM were running into out of memory exception and I got a chance troubleshoot it. W3wp process was consuming more than 5 GBs of memory on each WFE. Consequently this was not only bringing down the performance of the web applications running in that process but also the entire WFE itself.

There are few options available to troubleshoot out of memory problems on the production servers. Server administrators are reluctant to install or execute any tool on the production server because such tools could again bring down the performance. I always prefer WinDbg and SOS to knock down OOM issues. It is always easy and safer for server administrators to get the memory dump files.

When checked on the managed heap using WinDbg huge free space was available on the heap. Yet this process was running into out of memory exceptions. Moreover there were very few objects allocated on the LOH, so large objects were not causing this problem too. After digging little dipper using SOS commands, thousands of “pinned objects” found lying on the heap.

Causes of Pinned Objects

In a nutshell, GC periodically compacts objects on the heap in order to avoid managed heap fragmentation. Per this algorithm new objects are always allocated at the top of the heap. Objects allocated on the LOH are excluded from this compact operation as moving these objects is very expensive job for GC. Coming back to “pinned objects”, objects are pinned when they communicate with unmanaged code/COM. When the object is pinned GC does not compact or move such object because it has no way to notify COM that memory address has been changed on the managed heap. Such an object creates a block of free space if it survives the next GC. Continuous pinning of objects (and their survival from GC) could lead to a creation of lot of free space blocks on the managed heap which cannot be defragmented. This in turn foils all the efforts by GC to use the heap efficiently and eventually a process runs into out of memory problems.

Now, when SharePoint makes a COM call? Well, SharePoint managed code communicates with the content databases using a COM viz. OWSSVR.dll. Every time SPSite object is initialized or list items are fetched, a COM call is made which leads to a pinning of objects internally on the managed heap. Microsoft has smartly abstracted all these complications from SharePoint developers by encapsulating the whole logic in the SharePoint managed code.

Coming back to the problem I was debugging. With the help of SOS commands I found numerous exceptions with description “Detected use of SPRequest for previously closed SPWeb object. Please close SPWeb objects when you are done with all objects obtained from them, but not before.” were thrown by the .Net runtime. Digging more dipper I found creepy code in a web part was disposing the SPWeb object from the SPContext.

Here it is:

SPWeb web = SPContext.Current.Web;

//Code to access list data

web.Dispose();

Two instances of this web part were added on the master page. SharePoint needs this SPWeb object to execute other components/elements on the page. For this reason SPWeb object from the SPContext should not be disposed in the custom code; SharePoint itself removes it at an appropriate stage in Page life cycle. In this case without throwing any exception SharePoint again creates the SPWeb object and requests the resources from DB using COM call adding another pinned object on the heap. At this point if another instance of this web part disposed the SPWeb object, SharePoint would again create a new instance which in turn adds new pinned object on the heap. This happened for every page request. SharePoint Pages with this master page attached were accessed by more than 30000 users in the organization. Enormous pinned objects were created on the managed heap with just single request by these users this in turn caused a big mess on the managed heap. Consequently this memory mess-up crashed the process with OOM exception.

Conclusion

From the “science of pinned objects”, it can be conclude that, inefficient use of unmanaged objects, SPSite in SharePoint context, could cause a creation enormous free space blocks on the heap that .Net GC cannot defragment. This crashes the process with OOM exception when available memory is exhausted.