More about the garbage collector

Hi! In the last lesson, we first became acquainted with Java's built-in garbage collector and got a rough idea of how it works.

It works in the background while your program is running, collecting unnecessary objects that will be deleted later. Thus, it frees up memory that can be used to create new objects in the future.

In this lesson, we'll discuss in greater detail how it works. For example, how and when does an object become unnecessary? And how does the garbage collector find out?

These are questions we'll answer during today's lesson :)

The lesson will be more like an overview: you don't need to learn this material by heart. The intention is mainly to expand your vision regarding how memory and the garbage collector work, so just read through and find something new for yourself :) Let's go!

The first thing you need to remember is that the garbage collector works in parallel with your program. It is not part of your program. It runs separately (in the last lesson, we compared this to a robot vacuum cleaner)

But it wasn't always so. Garbage collection used to be performed on the same thread as your program. On some schedule (once every few minutes), the garbage collector would check for the presence of unwanted objects in the program. The problem was that the program would hang (not execute) during this check and garbage collection.

Imagine you're sitting in your office at work. But then the cleaning lady comes in to wash the floors. She drives you away from your computer for 5 minutes and you wait until she's finished cleaning. During this time, you're unable to work. That's about how garbage collection used to work :)

This mechanism was later changed, and now the garbage collector runs in the background, not impeding the work of the program itself.

You already know that an object dies when it no longer has references. In reality, the garbage collector does not count object references. First, this could take a long time. Second, it's not very effective. After all, objects can refer to each other!

pic2.png

The figure shows an example where 3 objects refer to each other, but no one else refers to them. In other words, the rest of the program doesn't need them. If the garbage collector simply counted references, these 3 objects would not be collected and the memory would not be freed (there are references to them!).

We can compare this to a spacecraft. During the flight, the astronauts decide to check the list of spare parts available for repairs. Among other things, they find a steering wheel and pedals from an ordinary car. Obviously, they aren't needed here and are taking up space unnecessarily (though these two parts are related to each other and have some functions). But inside the spacecraft, they are useless trash that should be discarded.

Accordingly, in Java, the decision was made to collect garbage based not on reference counting, but on the separation of objects into two types: reachable and unreachable.

How do we determine if an object is reachable?

It's all simply ingenious.

An object is reachable if it is referenced by another reachable object. Thus, we get a "chain of reachability". It starts when the program starts and continues for the duration of the program. It looks something like this:

pic3.png

The arrow in the figure indicates our program's executable code. The code (for example, the main() method) creates references to objects. These objects can refer to other objects, those objects to still others, and so on. This form a reference chain.

If you can trace along to chain from an object to the "root reference" (the one created directly in executable code), then it is considered reachable. Such objects are marked black in the picture.

But an object is unreachable if the object drops out of this chain, i.e. none of the variables in the code currently being executed references it, and it cannot be reached through the "reference chain". In our program, two such objects are marked red.

Note that these "red" objects have references to each other.

But as we said earlier, Java's modern garbage collector doesn't count references. It determines whether an object is reachable or unreachable. As a result, it will seize on the two red objects in the figure.

Now let's look at the whole process from beginning to end. In doing so, we'll also see how memory is arranged in Java :)

All Java objects are stored in a special area of memory called the heap. In everyday language, a heap is usually a mountain of items, where everything is intermixed. But that's not what the heap is in Java. Its structure is very logical and reasonable.

At some point, Java programmers found that all their objects could be divided into two types: simple objects and "long-lived objects".

"Long-lived objects" are objects that have survived many rounds of garbage collection. They usually live until the program ends.

In the end, the full heap, where all objects are stored, was divided into several parts.

The first part has a beautiful name: eden (from the biblical "Garden of Eden"). This name is fitting because this is where objects end up after they are created. This is the part of memory where new objects are created when we use the keyword new.

A lot of objects may be created. When this area runs out of space, an initial "fast" garbage collection begins. We must say that the garbage collector is very clever. It chooses an algorithm based on whether the heap has more garbage or more live objects. If almost all the objects are garbage, the collector marks the live objects and moves them to another area of memory. Then the current area is completely cleared. If there is not a lot of garbage, and the heap is mostly live objects, the collector marks the garbage, clears it, and packs the other objects together.

We said "the collector marks the live objects and moves them to another area of memory", but where? An area of memory where all objects that have survived at least one round of garbage collection are moved is called a survival space.

A survival space, in turn, is divided into generations. Each object belongs to a particular generation, depending on how many rounds of garbage collection it has survived. If an object has survived one round of garbage collection, then it is in "Generation 1"; if 5, then "Generation 5".

Together, eden and a survival space form an area called the young generation.

In addition to the young generation, the heap has another area of memory called the old generation. This is precisely the area where long-lived objects that have survived many rounds of garbage collection end up. There are advantages to keeping them separate from all the others.

Full garbage collection is performed only when the old generation is full, i.e. there are so many long-lived objects in the program that there is not enough memory. This process involves more than one area of memory. In general, it involves all the objects created by the Java machine. Naturally, this takes much more time and resources.

This is precisely the decision was made to store long-lived objects separately. "Quick garbage collection" is carried out when other areas run out of space. This involves only one area, which makes it faster and more efficient.

Finally, when even the area for long-lived objects is completely filled, full garbage collection is triggered. Thus, the collector uses the "heaviest" tool only when it is impossible to avoid it.

Here's a visual representation of the heap structure and garbage collection:

pic4.png

Was published on CodeGym blog .

Comments (1)

Add a comment
Agnius Vasiliauskas's photo

Nice article. I will admit, that I was a bit skeptical that Java uses Tracing garbage collection instead of traditional reference counting. So I made a quick test to see if java will not get Out-Of-Memory exception when bunch of cyclic objects will be constantly generated in a loop :

public class gctest {

 private gctest self;

 public gctest() {
   this.self = this;
 }

 public static void main(String []args) {
   while (true) {
     new gctest();
   }
 }

}

Indeed Java program has not crashed and GC perfectly cleaned circular references.

I only need to mention that traditional Reference counting GC model has also advanced and now is able to clean circular object references too, because there exists special algorithms in RC model which searches for a cycles in references.

The same GC test in PHP (which uses reference counting GC model) :

class GCtest {
  private $me;

  public function __construct() {
    $this->me = $this;
  } 
}

while (true) {
  new GCtest();
}

Script also has not crashed and traditional RC GC model cleaned objects with circular references. The only difference between Java and PHP in this case which as was able to notice - is that Java program has consumed ~ 100% of memory, but PHP version has not altered memory usage. Probably this means that Java GC collecting cycles starts at far lower rate than in PHP. Thus creation of objects fills heap at a rate at which Java GC collector is not able to catch-up.