.NET Garbage Collection in a nutshell - debugging code down to the metal - Part 2

Problem and a cause of it at high level is discussed in Part-1 of this post. Here in Part-2 in-depth root cause and fix is explained.

Here is the System.Collections.Generic.List.cs code that come into picture when List.Add() method is called:

public void Add(T item) {
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size++] = item;
    _version++;
}
 private void EnsureCapacity(int min) {
    if (_items.Length min) {
        int newCapacity = _items.Length == 0? _defaultCapacity : _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity min) newCapacity = min;
        Capacity = newCapacity;
    }
}
 public int Capacity {
    get {
        Contract.Ensures(Contract.Resultint>() >= 0);
        return _items.Length;
    }
    set {
        if (value _size) {
            ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.value, ExceptionResource.ArgumentOutOfRange_SmallCapacity);
        }
        Contract.EndContractBlock();

        if (value != _items.Length) {
            if (value > 0) {
                T[] newItems = new T[value];
                if (_size > 0) {
                    Array.Copy(_items, 0, newItems, 0, _size);
                }
                _items = newItems;
            }
            else {
                _items = _emptyArray;
            }
        }
    }
}

Evidently, if the upper limit of the List isn’t defined, it goes on creating new array object (maintained internally by List) continuously (in Capacity property). Default this array starts with 4 element and increases to 8, 16, 32... whenever items reach the capacity.

Firstly, this List object is allocated on the Small Object Heap (heap which is GCed through the Generation 0, 1, 2 algorithm) as long as it remains less than 85 KB in size. Until it crosses this limit, many objects are created on Gen 0, Gen1, Gen 2 heaps and runtime pushes GC to clean up these objects more frequently. This requires Processor to spend more time in GC causing performance.

Secondly, when size reaches 85 KB limit, objects are continuously allocated on Large Object Heap. This heap isn’t controlled by GC Generation algorithm. On the Small Object Heap objects are always moving (with few exceptions such as when pinned) as they moved to new memory location when pushed to the next generation. Also, objects’ memory locations change when compaction happens to defragment the heap. Such memory compaction doesn’t happen for the objects allocated on the Large Object Heap. Objects allocated here stick to a same memory address as defragmenting LOH is a very expensive operation

But with advent of .NET Framework 4.5.1, LOH defragmentation is possible. This come at a huge cost. The .Net Team, in their blog Announcing the .NET Framework 4.5.1 Preview, has clearly warned on the usage of LOH compaction feature:

LOH compaction can be an expensive operation and should only be used after significant performance analysis, both to determine that LOH fragmentation is a problem, but also to decide when to request compaction.

During LOH compaction, all operations on the managed heap become extremely slow. This affects allocating new objects on the heap until compaction completes. Better idea is to take all preventive measures so that allocation on the LOH can be minimized.

Back to the problem we were troubleshooting. The fix is very very simple. Set the upper limit when initializing the List object.

Listcustom_object> list = new Listcustom_object>(200000);

This not only prevents LOH fragmentation but also averts multiple List objects creation on the Gen 0, Gen 1 Gen 2 heaps too and minimizes time spend by GC there which again improves the performance.

Another solution:

.Net abstraction comes with advantages and disadvantages. Same is the case with Listobject too; it abstracts many functionalities which are required in most of the cases. Yet, there are plenty of scenarios wherein it is required to stay away from abstraction, such as List, and a simple array is only the solution.

CustomObject[] array = new CustomObject[200000];