In 2018, another perspective on microservice monitoring and performance optimization

On May 21, 2017, Rongshu architect Liu Disheng shared his speech “Micro-service Monitoring and Performance Optimization” at the “Ele. me Technology Salon [Sixth shell] Beijing R&D Center ·Java Special”. IT Tycoon said (ID: ITdakashuo) as the exclusive video partner, by the sponsor and speaker review authorized release.

Read the word count: 1585 | 4 minutes to read

Guest speech video address: suo.im/4qVXQg

Abstract

This paper mainly introduces the basic concepts and methods of distributed monitoring, the monitoring mechanism of Java technology stack, the practice and application of performance monitoring, service monitoring, anomaly monitoring and performance data analysis in Rongshu micro service platform.

Microservice monitoring

What does microservices look like

Microservice architecture is essentially a service-oriented distributed architecture with its own characteristics.

Microservices architectures are characterized by more fine-grained service boundaries that encourage independent development, testing, deployment, scaling, and so on, the agility improvements that come with finer granularity, and the inherent complexity of distributed systems.

Why monitoring?

Microservices are a distributed architectural pattern that has always had its own problems.

The first is the positioning of the problem. As the system scales from a single node to many nodes, if there is a problem at one point in the system, it can become a challenge for our ops developers to locate the problem.

Secondly, when the new business comes in, whether the system can support it and how is the system running? In addition, some e-commerce companies now want to do promotional activities, how to do capacity planning? We can measure the system through monitoring means, or do a data support.

Other things to understand are the topology of distributed systems, how they are deployed, how they communicate with each other, what their current performance is, and how to detect problems when they occur.

These are all possible problems for distributed systems. After these problems, monitoring is a more common and efficient means.

In general, monitoring is mainly concerned with sensing the state of the system.

How? Data driven

Application performance and topology third-party components; Resource use; Exception stack; Data aggregation, analysis and alarm; Customize services.

Common monitoring means

Open source: Zabbix, ELK, Zipkin;

Closed source: Prometheus, Pinpoint, Newrelic;

Or “roll up your sleeves and do it yourself.”

Java stack monitoring mechanism

Command line tools: Command line tools

Code-level tools: Log, SDK, AOP

Collect data and get the metrics you really care about: Instrument+JMX

JMX mechanism to obtain JVM, OS related data:

ManagementFactory.getXXXMXBean();

OperatingSystemMXBean

RuntimeMXBean

MemoryMXBean

ThreadMXBean

Collection<GarbageCollectorMXBean>

Practice and Optimization

Microservice practices

Ideas: Around the development, deployment, invocation, communication, business process of microservices.

In terms of development and deployment, we built a complete tool chain, and built up the initialization environment and code structure of the whole system through the plug-in mechanism.

When called, the average user doesn’t want to get too involved in a particular framework. It is best to initialize the environment transparently, almost out of the box.

A distributed framework means that there are issues such as cross-process invocation and data transfer that need to be resolved. It might not be very friendly from a performance perspective if you rewire each call. Long connections can solve part of the problem, but if the communication data is very large, it also involves data compression and event asynchrony.

Try to make the framework transparent and reduce or eliminate dependencies during use.

Operation and maintenance monitoring access integrated monitoring platform.

Monitor the practice of collection side

Idea: Achieve high efficiency within limited resources.

We didn’t want to import a lot of source code for monitoring purposes, so we used Instrument and JMX mechanisms. We need to control it in real time, so make it configurable.

There are performance considerations, such as limiting the size of its threads and so on. Frequent collection will have a very serious impact on the system, so we choose to use the way of sampling.

Look at performance optimization from a different perspective

More difficult than optimization is finding problems;

There is no condition or target optimization is playing rogue;

To pursue the efficient use of resources.

Java performance optimization: Common code optimization for a specific problem

Buffer is used for file IO operations. Motivation: Fewer kernel-level calls, fewer IO operations, possibly fewer CPU instructions.

Multithreaded environment, open, reduce lock contention. Motivation: The expansion of computing power by freeing the single-thread limit and gaining CPU cores.

With JDkCollection, the data structure specifies the size. Motivation: Reduce memory footprint associated with capacity expansion and CPU instructions associated with replication and old data reclamation.

Tweak the algorithm. Motivation: Reduce or optimize CPU instructions for each task.

Java performance optimization: JVM tuning, time-space operational trade-offs

Adjust the heap size. Motivation: Cenozoic, old age, method area size under stable application state. Resizing, in turn, affects the behavior of the GC.

Replace the GC. Motivation: Response time, throughput, and ultimately CPU.

Start or stop other specific parameters. Motivation: scenariospecific, such as post-mortem, escape analysis, JIT support.

Java performance tuning: JVM tuning steps

Calculate the size of the application’s long-lived objects (old age, permanent generation) from the GC logs.

Establish the heap base size (Reference: limit the heap to 3-4x of its size based on the size of the long-lived object. So the Cenozoic generation is 1~1.5x, the old generation is 2-3x, and the permanent generation is 1.5x of its size).

Target GC behavior for swallowing and response time. Enable the appropriate GC collector, adjust the appropriate new generation, old generation, permanent generation size.

Start and stop other parameters according to additional requirements.

That’s all for today’s sharing, thank you!