We wrote some time ago a post about Java memory optimization in Docker containers. In this post we received a very interesting comment related to the impact that Garbage Collector (GC) pauses can have on the behavior of an application.
This comment has encouraged us to write this post, where we will explain how GC can affect the performance of our application and how we can improve it by choosing the right type of GC.
Understanding how the Garbage Collector works
The GC plays a fundamental role within the JVM, since it is in charge of freeing up Heap memory occupied by objects that are no longer in use in our application.
It normally works when our application is reaching levels in Heap memory consumption close to the maximum available, at which point a GC cycle begins.
The GC works following a Mark and Sweep algorithm. In the first instance, in the “Mark” phase, the algorithm analyzes the object references in the Heap memory, marking the objects that are reachable, that is, accessible by our application. In the “Sweep” phase, the GC reclaims the Heap memory space occupied by garbage objects (not reachable), freeing up that space for new objects of the application.
This mechanism allows developers to completely forget about memory management, which in return makes them lose control over memory, which could negatively impact application performance.
While GC is an implicit mechanism, Java offers different types of GC. Choosing the optimal GC for each scenario allows us to improve the performance of our applications.
It is the simplest GC. It is characterized by suspending all the threads of the application, to then execute the Mark and Sweep algorithm in a single execution thread. These GC-generated pauses in the application are known as “stop the world”.
This GC is similar to the previous one, with the main difference being that the GC executes the Mark and Sweep algorithm in multiple threads. Although there is still an application pause caused by the GC, pause time is considerably reduced compared to the Serial Collector.
This type of GC is also known as “Throughput Collector”, since it is designed so that the application can offer a high throughput level.
The Concurrent Collector runs most of the time concurrently with the execution of the application. This GC does not prevent pauses in the application and requires two short pauses compared to those generated by previous GCs.
In the first pause, the “Mark” phase is executed in the GC roots, which are the objects always reachable by the JVM from which references to other objects are generated in tree form.
The “Mark” phase continues concurrently with the execution of the application, this time starting from the GC roots and marking connected objects. As objects are being marked at the same time as application threads are executed, new changes that may have been made to already marked objects are also recorded.
The second and last pause execute final markup on new objects that may have been created.
Our starting Java application
In our previous post we saw how by limiting the Heap memory of our Java application we could significantly reduce the total memory required. When monitoring our application after memory optimization, we verified that throughput remained the same as before optimization, even with a greater presence of the GC due to the need to keep the Heap memory consumption below the established limit.
In this scenario we didn’t find the need to perform any GC related optimization. Now, what happens if we scale the same application and increase its load?
For this post, we have scaled the application’s memory so that it can support 300 simultaneously active users, which is three times more users than in the previous scenario.
Keep in mind that the type of GC from which we start in our application is the Parallel Collector, which is the default GC in Java 8.
How are we going to monitor our application?
First, we are going to define each of the tools that we are going to use to monitor our application:
- Apache JMeter: to perform a load test on our application and graphically visualize the performance of the application.
- JConsole: to graphically view the Heap and CPU consumption of our application during the test.
- GCViewer: to graphically view the logs generated by the Garbage Collector during the test.
Setting up JMeter
To carry out our load test we are going to start from the same JMeter project that we used in our previous post. The only difference is that we are going to modify the Thread Properties by setting 300 simultaneous users instead of 100.
Once the project is saved with the updated property, we are going to configure JMeter to generate visual dashboards of the performance of our application during the load test. Fortunately, there is official documentation that explains how to configure the generation of dashboards in JMeter.
Setting up JConsole
In order to monitor our application with JConsole, we first have to activate the Java Management Extensions (JMX). I encourage you to take a look at our previous post, where I explain step by step how to do it and how to connect JConsole to our application.
Setting up GC logs
GCViewer is fed from a GC logs file, therefore before using GCViewer we have to configure the JVM to obtain logs of the GC performance.
For this purpose we will use the following JVM flags:
- -XX:+PrintGCDetails: Activates the generation of detailed GC logs.
- -Xloggc:/tmp/gc/gc.log: It allows us to specify the path of the file where to write the GC logs.
We already have everything we need to be able to run our load test and visualize the performance of our application.
Let’s go with it!
Monitoring our starting application
Once the load test has been executed and having waited for it to finish, we will have available the information we are seeking to understand how our application has behaved during the test.
First, we are going to see in JConsole information about Heap memory consumption:
As you can see, Heap memory consumption rises to stabilize near 512Mb. This behavior is due to the fact that as the load test progresses, the heap memory is occupied with new objects, ending up stabilizing near the established limit (512Mb). The peaks in the graph are due to the action of the GC.
The next step is to analyze how the throughput of the application has behaved. This information can be obtained through the dashboards generated by JMeter:
As we can see, the performance increases to about 175 transactions/second, dropping roughly in the seventh minute of the test to about 70 transactions/second. The reason for this drop may be a pause caused by the GC, since remember that it is Parallel. We will validate this hypothesis when we analyze the GC logs.
For those applications where we need constant high throughput, drops like this can be unacceptable.
To obtain information about the behavior of the GC, we open the generated log file in GCViewer:
Each vertical black line marks the start and end of a complete GC cycle. We can see how the frequency of GC cycles increases as the test progresses.
The blue areas correspond to the total Heap memory used in time. The larger the Heap memory in use, the more GC cycles are required to free up space.
The dark gray lines at the bottom of the graph correspond to the GC collection times. We can see how in the middle of the graph the collection times increase notably, is this related to the throughput drop?
GCViewer allows us to enlarge the graph to monitor a more limited time interval. What we are going to do is to enlarge the graph to approximately the seventh minute of our test, in order to validate our hypothesis related to the throughput drop due to a possible GC pause.
We must bear in mind that the dates and times shown by GCViewer in the upper chronogram do not correspond to those of the load test, but are shown as if the test had started when the log file was opened in GCViewer. Fortunately, knowing the time interval, it is easy to place ourselves where we want.
We have enabled a GCViewer option that allows us to see the GC times with green lines, since by having enlarged the graph the information can be more confusing.
Looking at the graph, we can see that we are positioned at the moment when the GC collection time rises significantly, which also coincides with the throughput drop of our application.
Taking into account the above, we confirm that our hypothesis is true and that, therefore, the throughput of our application has been occasionally notably affected by the type of GC.
GCViewer also offers us a table with aggregated data related to the performance of the GC. The throughput indicated here corresponds to the percentage of time that the application has not been busy with the GC, in our case 84.35%.
Saving all the information that we have obtained is very useful for further optimizations on the GC. In this way, we can contrast the performance of the optimized GC with the initial scenario.
Surely you are wondering, what exactly are we going to do with the GC of our application?
As I mentioned earlier, in some scenarios we cannot afford throughput drops like the one we have experienced. However, the average throughput that we have obtained, added to the percentage of time that the application has not been busy with the GC, may be more than acceptable results in other scenarios.
Let’s suppose that we need our application to be able to maintain a good throughput level on a constant basis, perhaps not as high as what the Parallel Collector can offer, but without experiencing substantial drops. I’m sure many of you are already thinking that maybe what we need is the Concurrent Collector.
Monitoring our application with Concurrent Collector
We must bear in mind that to benefit from this type of GC, we need to have multiple CPUs in our machine, otherwise this would not be the appropriate GC for our scenario.
On the other hand, due to the creation of new objects and changes in the references between objects that can occur concurrently with garbage collection, this GC requires additional work that translates into an increase in resource consumption.
With the above in mind, let’s see how this GC behaves in our application!
First of all, it is necessary to explicitly indicate to the JVM to use the Concurrent Collector, so that it does not assign the GC by default, which we remember that in our case it would be the Parallel Collector. The JVM offers a specific flag for this purpose: -XX:+UseConcMarkSweepGC.
Once the load test has been performed on our application with Concurrent Collector, we are going to analyze the graphs in this new scenario. We start with the Heap memory consumption:
As we can see, memory consumption rises following a similar trend to the previous scenario until it approaches the maximum (512Mb), with the difference that the increases and decreases in memory consumption are shorter, due to the concurrent cleaning with the execution of the application.
Let’s see the application throughput behavior and if, thanks to the Concurrent Collector, we have managed to avoid significant throughput drops:
Looking at the graph we can affirm that during the load test the throughput has remained at maximum values a little lower than those obtained with the Parallel Collector, but with the main advantage that we have not experienced any significant drop.
Let’s open GCViewer to see in detail how the Concurrent Collector has behaved:
The first thing that surely catches your attention in this graph are the vertical turquoise lines. These lines mark the collections made concurrently by the GC, note that this does not exist in Parallel Collector.
Here we can see again the increases and decreases in Heap memory consumption more attenuated than in the Parallel Collector due to concurrent collections.
Another aspect of the graph that draws our attention are the black lines that mark the beginning and end of the GC cycles. As we can see, in this GC the cycles, especially at the beginning of the test, are longer in time.
Unlike what happened with Parallel Collector, we didn’t find any substantial recorded increase in GC times.
Finally, let’s review the GC performance aggregated data table:
Throughput, which in the Parallel Collector was 84.35%, here is 75.73%, which shows that there has been a slightly higher GC occupation in our application.
Which Garbage Collector do we choose?
As I mentioned before, it depends on our particular scenario and the requirements that we want our application to meet.
With the Parallel Collector, our application can offer a very high throughput level at certain moments, although not sustainable over time, since we can experience significant drops due to GC pauses like the one we have seen in the load test.
On the other hand, with the Concurrent Collector we managed to maintain a slightly lower throughput on a sustained basis, although with an increase in CPU consumption. Below is a graph of the CPU consumption of the Concurrent Collector during the load test:
And here, the same graph using Parallel Collector:
As we can see by comparing the two graphs, with the Concurrent Collector we find several CPU consumption peaks above 30%, some even reaching over 40%. On the other hand, with the Parallel Collector, CPU consumption is rarely above 30%.
Taking into account the analysis that we have carried out using two different types of GC, it is a matter of deciding which option is best for us given our scenario.
If we don’t need a particularly high throughput level and what we are looking for is to maintain it over time, the Concurrent Collector is a good option, as long as we can bear the additional increase in CPU usage.
If, on the other hand, we seek to maximize the throughput level of our application, even though significant drops can be caused, or we cannot assume the additional CPU consumption of the Concurrent Collector, opting for the Parallel Collector is a great idea.
In this post we have carried out an analysis on the performance of a Java application using different types of GC.
First of all, we have started from an example application that used Parallel Collector as GC and we have monitored the impact of this GC on the application performance through a load test, finding a high level of throughput but with a significant drop due to a GC pause.
Next, we have changed the GC of the application to the Concurrent Collector and we have run the same load test on the application to monitor it again, finding a slightly lower but sustainable throughput level over time.
Finally, we have compared the results obtained and we have identified the strengths and weaknesses of each GC to take into account in order to choose the optimal one for our scenario.
Keep in mind that the results obtained in this analysis are specific to our example application. In other applications we can find completely different results using the same GCs, so I suggest you carry out similar analyzes in your applications to find the GC type that best suits your scenario.
We decided to write this post due to a comment we received in another post, so I encourage you to share your opinions and experiences in the comments section. Maybe you give us another great idea to write a post! 😉