Welcome to follow our wechat official account: Shishan100

My new course ** “C2C e-commerce System Micro-service Architecture 120-day Practical Training Camp” is online in the public account ruxihu Technology Nest **, interested students, you can click the link below for details:

120-Day Training Camp of C2C E-commerce System Micro-Service Architecture

“In this article, let us talk about a practical experience in production environment: when online system deployment, JVM heap size is better?

Kafka and Elasticsearch are distributed systems that can be deployed online. This is not a normal Java application.

1. Whether to rely on Java system memory to process data?

First of all, whether we develop our own Java application system, or some middleware systems, in the implementation of the need to choose whether to process data based on their own Java process memory.

As you all know, Java, Scala, and other programming languages rely on the JVM at the bottom, so whenever you use the JVM, you can consider placing large amounts of data in the memory of the JVM process.

Just to give you an example, you may remember talking about message-oriented middleware systems.

For example, if system A can send A message to system B, it needs to rely on A message-oriented middleware. System A sends the message to the message-oriented middleware first, and then System B consumes the message from the message-oriented middleware.

Look at the diagram below.

As you probably know, one way to handle a message sent to the messaging middleware is to buffer the data in your OWN JVM.

Then, after some time, flush the message from its own memory to disk to persist it, as shown below.

2. What are the drawbacks of relying on Java system memory

Relying on the Java system’s own memory to process data in a similar manner, such as designing a memory buffer to hold a large number of messages written concurrently, has its drawbacks.

The biggest flaw is the JVM’s GC, which is called garbage collection.

If you think about it, a Java process is filled with a lot of data that is cached in memory but later written to disk.

Does the data need to remain in memory after being written to disk?

The JVM garbage collection mechanism is used to collect data that is not needed in memory, freeing up memory space.

However, when JVM garbage collection is performed, there is a situation called stop the world, which is when it stops your worker thread, so it is used exclusively for garbage collection.

At this point, while it is in garbage collection, it is possible that your middleware system will not work.

For example, if you send a request to him, he may not be able to respond to you because his worker thread that receives the request has stopped, and now his background garbage collector thread is collecting garbage objects.

Look at the picture below.

The JVM’s garbage collector has been evolving, from CMS to G1, to minimize the impact of garbage collection and reduce worker thread pauses.

But if you rely entirely on JVM memory to manage large amounts of data, there is always an impact when it comes to garbage collection.

So the Garbage Collector (GC) problem of the JVM is the biggest headache, especially for some big data systems, middleware systems.

Optimize to rely on OS Cache instead of JVM

So distributed middleware systems like Kafka and Elasticsearch, which also run on the JVM, rely on OS Cache to manage large amounts of data.

That is, operating system managed memory buffers, rather than relying on the JVM’s own memory to manage large amounts of data.

Specifically, for Kafka, if you write a piece of data to Kafka, it actually writes directly to a disk file.

But disk files actually go to OS cache, the memory space managed by the operating system, before being written, and then the operating system itself chooses to flush the data from its OS cache to disk at some point.

The OS cache is then used to read the data before consuming it.

In other words, data is written and read based on the OS cache, which is completely based on the memory area at the operating system level. Therefore, the read and write performance is very high.

In addition, there is the added benefit of not relying on your own JVM to buffer large amounts of data, thus avoiding complex and time-consuming JVM garbage collection operations.

The following diagram shows a typical Kafka process.

Elasticsearch, one of the most popular distributed search systems, uses a similar mechanism.

A large number of users rely on the OS cache to buffer large amounts of data, and then read data from the OS cache (memory area) first during search and query, thus ensuring high read/write performance.

4. Old driver’s rule of thumb: The larger the system JVM memory that relies on OS cache, the better?

So that brings us to the topic of kafka, ElasticSearch, and so on. When deployed online in a production environment, you know that they rely heavily on OS cache to buffer large amounts of data.

Is it better to allocate larger JVM heap memory to them?

Obviously not. If you have a machine with 32GB of ram, and you don’t know what’s going on, and you think it’s better to allocate more memory to the JVM, for example, 16GB of heap memory to the JVM, then the OS cache has less than 10GB of memory left. Other programs take up several gigabytes of memory.

If this is the case, the OS cache is limited in how much data you can hold when writing to disk.

For example, with 20GB of data to be written to disk, only 10GB of data can now be stored in OS cache, and another 10GB of data can only be stored on disk.

At this point, half of the read requests must be read from disk, not from OS cache. The OS cache can only hold half of the 10GB of data, and the other half is stored in disk.

At this point, half of your requests are reading data from disk, which inevitably leads to poor performance.

ES is slow to read, hundreds of millions of data are written to ES, and it takes seconds to read.

Can that take a couple of seconds? If you have an ES cluster deployment, and the JVM is overloaded with several gigabytes of memory for the OS cache, the result is that hundreds of millions of pieces of data are mostly on disk, not in the OS cache.

5. Correct: Allocate more memory to the OS cache for specific scenarios

So for production systems like Kafka and Elasticsearch, you should give the JVM something like 6GB or a few GB of memory.

Because they probably don’t need to consume too much memory space, they don’t depend on THE JVM memory management data, of course, the specific Settings, you need to be precise pressure and optimization.

For this type of system, however, you should reserve enough memory space for the OS cache. For example, a machine with 32GB of RAM can reserve more than 20GB of MEMORY space for the OS cache. Assuming that your machine writes 20GB of data, it can reside in the OS cache.

Then you can read all the data from the OS cache, depending on memory, and your performance must be millisecond. You can’t complete a query in a few seconds.

The whole process is shown in the figure below:

Therefore, it is suggested that when we introduce any technology into the production system online, we should first have a deep understanding of the principle of this technology, and even the source code, know what his specific workflow is, and then targeted reasonable design of the deployment of the production environment, to ensure the best production performance.

END