The
Garbage Collector (GC) in .NET can be thought of as the subsystem that handles
allocating and de-allocating managed memory. Ever used variables in c?
main() {
char *name = malloc(50);
doSomething(name);
free(name);
}
|
Or in c++?
main() {
int* values = new int[10];
doSomething(values);
delete [] values;
}
|
With GC,
now we don’t need to worry about freeing the memory we use.
static void Main() {
var items = new List<string>();
doSomething(items);
//no need to "free" or
"delete"
}
|
So, that
is GC. Use whatever variables you want, then forget about them. The GC will
take care of it, right? Something doesn’t sound right about that… let’s keep
going.
When
Memory Is Allocated
Unless
noted otherwise, when I talk about “memory”, I am referring to virtual memory.
If I am referring to any other kind of memory, I’ll say so.
When a
.NET application starts, the CLR initializes a managed heap for the
application. This heap is shared throughout the entire process. Segments of
memory are added to the heap by the CLR as needed, when virtual memory is
requested that is larger than what is already available on the heap.
Here is a
visualization of how virtual memory is allocated in the managed heap:
Let’s
start with a clean slate…
The
application starts, and the CLR initializes a managed heap that consists of 1
segment
As the
application runs, memory is allocated for more and more objects and variables.
When a
request is made for memory, and there is not enough space in the current heap
segments, then another one is reserved by the CLR
This
continues as long as memory is being allocated for the application. If that is
where it stopped, it would be hugely wasteful. Most of those variables are no
longer reachable by the code and will never be used again, and the system is on
a one-way track to running out of memory. Keep reading…
The
Triggers for Garbage Collection
So, in a
managed application, you don’t release any memory manually. As you are reading
this you should be asking yourself, “so when does the GC run?” Good question!
I’m glad you asked!
Many
people and articles will tell you that the garbage collector is always running.
This is not strictly true. As your application is running, and the threads are
allocating memory, the CLR is always watching for certain situations that will
trigger a garbage collection, but memory is not being collected constantly.
Certain things will trigger the collector to actually reclaim some memory. When
any of these situations are true, garbage collection will occur:
- The host system is
low on virtual memory
- Allocated memory on
the managed heap exceeds an “acceptable” threshold
- GC.Collect() is
called
So, how
often does garbage collection actually happen? I dunno. Laugh if you want, but
that’s the real answer. When you hear about garbage collection being
non-deterministic. This is what they are referring to. The application writer
cannot determine when garbage collection may run. Sure, you can call
GC.Collect() to force it to run, but it could run on its own without your
knowledge.
The
Collection Process
When GC is
triggered, all user threads are suspended (read up on server, workstation,
background and non-concurrent to learn when this may not be true), and then the
garbage collector is given the reigns. Once the garbage collector has control,
the process consists of 3 phases: Marking, Relocating and Compacting.
Marking
Phase
The
marking phase begins by collecting the list of all “garbage collection roots”.
This includes all objects and references that are directly referenced by the
process and app domains. Things like static items, globals, finalizer queue and
call stack local variables and parameters.
From
there, the GC walks all references from all of the GC roots. Every item it
finds on the managed heap gets added to a list of “live” objects. When an
object is processed for references, it is skipped every time it is encountered
again. This allows circular memory references in objects, and will not
interrupt or slow down the GC.
Relocating
and Compacting Phases
In order
to simplify this post and hopefully reduce some confusion, I am going to
describe the relocating and compacting phases together. For my intentions here,
there is no need to discuss them separately.
The
purpose of these phases is to reclaim memory and maintain (or improve) the
performance of the garbage collector. It does this by scanning the heap for
unused memory (based on the new graph of “live” objects), freeing unused
memory, moving live objects closer together, and moving survivors towards or
into older memory segments.
Since I am
a visual thinker, here is the basic process with pictures…
After the
marking phase, we have a graph of all of the “live” objects (indicated in blue
below). None of the other objects are accessible from the GC roots.
All of the
“dead” objects are freed.
All
remaining objects are moved together to maximize memory availability.
Now is the
time to note that the Large Object Heap does not get compacted. Large objects
are large and should be longer-lived, which means there is less value in
compacting them. Another reason is that large objects are more expensive to
move, so it would adversely affect the performance of the GC.