Welcome to pay attention to Xiao Ge play structure friends tree search Hu Xin special topic you really understand hundreds of millions of traffic site performance optimization?Juejin. Cn/post / 689314…

Concurrency awareness & Stress Testing & Server optimization

Concurrency & High concurrency

1. What is concurrency?

? In an operating system, a period of time in which several programs are running between start and finish, and they are all running on the same processor.

Just like the operating system timesharing mentioned earlier. Playing games and listening to music are completed from start to finish on the same computer in the same period of time. So it can be said that listening to music and playing games are concurrent.

2. Concurrency and parallelism

The two of us are having lunch. You eat rice, you eat vegetables, you eat beef throughout the meal. Eating rice, vegetables, and beef are all done concurrently.

To you, the whole process seems to happen simultaneously. But you’re switching back and forth between eating different things.

It’s just the two of us having lunch. In the course of the meal, you had rice, vegetables, beef. I also ate rice, vegetables and beef.

The two of us eat in parallel. Two people can eat beef together at the same time, or one can eat beef and one can eat vegetables. They are mutually exclusive.

So, concurrency is when multiple programs run at a macro level at the same time over a period of time. Parallelism is when multiple tasks are actually running simultaneously at the same time.

The difference between concurrency and parallelism:

Concurrency is when multiple things happen at the same time at the same time.

Parallelism means multiple things happening at the same time.

Concurrent tasks preempt each other’s resources.

Parallel tasks do not grab resources from each other.

Parallelism occurs only in the case of multiple cpus. Otherwise, everything that seems to happen at the same time is actually executed concurrently.

3. What is high concurrency?

“High concurrency and multi-threading” are often mentioned together, giving the impression that they are the same, but in fact high concurrency does not equal multi-threading (they are not necessarily directly related).

Multithreading is a way to complete tasks, and high concurrency is a state of system operation. It is helpful for the system to bear the high concurrency state through multithreading. High concurrency refers to a "large number of operation requests in a short time" during system operation. It mainly occurs when a large number of web system accesses or socket ports receive a large number of requests (for example:12306The situation of snatching tickets; Tmall Double 11 event). A large number of operations, such as resource requests and database operations, will be performed during this period. If the high concurrency is not handled properly, not only the user experience is reduced (the request response time is too long), but also the system may break down, or even OOM exception and the system stops working. If the system can adapt to the high concurrency state, it needs to optimize the system from all aspects, including hardware, network, system architecture, development language selection, data structure application, algorithm optimization, database optimization... Multithreading is just one solution. To achieve high concurrency, consider:1) the architecture design of the system, how to reduce unnecessary processing (network requests, database operations, etc.) at the architecture level, such as Cache to reduce I/O times, asynchronous to increase single-service throughput, and lockless data structure to reduce response time; (2Network topology optimization to reduce network request time, how to design topology, how to achieve distributed? (3What design patterns are used for code optimization at the system code level? Which classes need to use singletons and which need to be minimizednewOperate? (4) Improve the efficiency of code level operation, how to select the appropriate data structure for data access? How to design the right algorithm? (5) Synchronous operations at task execution level, where is synchronization used and where is asynchrony used? (6JVM tuning, how to set Heap, Stack, Eden size, how to choose GC policy, control the frequency of Full GC? (7Server tuning (thread pool, wait queue) (8Database optimization reduces query modification time. Database selection? Database engine selection? Design of database table structure? Design database indexes, triggers, etc.? Use read/write separation? Or do you need to consider using a data warehouse? (9How to choose a cache database? Is it Redis or Memcache? How to design the caching mechanism? (10) Data communication problem, how to choose the communication mode? Should YOU use TCP or UDP, long or short connections? NIO or BIO? Netty, MINA or native socket? (11) Operating system selection, winServer or Linux? Or Unix? (12) Hardware configuration? 8GB memory or 32GB, 10GB network card or 1GB? For example, increase the number of CPU cores32Core, upgrade better nic such as 10 GIGABit, upgrade better hard disk such as SSD, expand hard disk capacity such as 2 TB, expand system memory such as 128 GB; The above problems must be considered deeply in high concurrency, just like the barrel principle, as long as one of them is not considered, will cause system bottlenecks, affecting the operation of the entire system. The high concurrency problem not only involves a wide range, but also requires sufficient depth!! And multithreading here is just one of the methods to solve the problem of high concurrency in the same/asynchronous perspective, which is a way to use idle resources of the computer at the same time. The function of multithreading in solving the problem of high concurrency is to make the resources of the computer reach the maximum utilization at every moment, so as not to waste the resources of the computer and make it idle. Unix is the first mature computer operating system bai, du was initially as a server operating system, ZHI enterprises or universities can afford dao, after several other unix-based operating systems, there is a miniUNIX developed for educational use, functionality is very limited. So Linus decided to build a system based on mini himself. He posted the idea on the Internet and developed the first Linux version, and then more and more developers, and then companies or teams developed it, which led to the current distribution of Ubuntu, Suse, Deban, Red Hat, etc. Good or bad is not absolute, Unix as a server is absolutely the best, otherwise IBM would have quit, but his price is also very high. So Linux server edition, the benefits are obvious, open source is cheap, now Linux is free, but a lot of other functions as a server are charged, not completely free. For individual users, Linux is definitely not an option, and there are now desktop versions available in a variety of distributions that are very easy to use. And because Linux for Unix inheritance, learn Linux instructions basically learn Unix instructions, so Linux is a bit better.Copy the code

4. Do you really understand high concurrency?

? High concurrency is the bragging force we imagined, in fact, most of the business scenarios do not exist in the case of concurrent competition data, then add services and machines can basically solve the problem, you want to deal with the pressure of the table, if the query pressure on the master and slave + cache, there is always a way to solve the problem.

# the so-called 100 million level traffic, 100w level QPS, are you really confused?? Example: a service1day36Ten thousand pens (coarse-grained statistics), the average RT is 100ms, the index is calculated as follows ** index =(daily quantity)36M /10Hourly peak /3600For example, the peak value of recent monitoring requests for a service is1hours36Ten thousand pens (coarse-grained statistics), the average RT is 100ms, and the index is calculated as follows ** index =(hour quantity36M /3600* * = 500 seconds) x 5 times insurance coefficient of Qps, measure index of high concurrency Qps rt100ms #, TPS, RT, throughput # goods details page may have the following query - million level flow: a day would probably have a list of 100 w, so the flow of goods more details will be how many?1, commodity data information query2~3time2, specification selection query (click once for each specification)20time3, product review inquiries3time4Several goods are compared repeatedly10time5, and then go to other platforms to query goods, come back to see2-3time46For the time being, follow50Time is a50Query, may be able to clinch a deal. Peak: Assume that traffic increases during holidays3Times,50*3 = 150Then 100W single may have: 100W *150 = 1.5When doing system design, the statistical flow must be calculated more, in order to cope with emergencies, the flow will suddenly become larger, generally calculated value multiplied by three times. Only peak requests are counted,1.8Million may have1.5Billion in12The traffic per hour during peak hours is:1.8Million /12= 1250W The flow rate per minute is 1250W /60 = 20.8W flow per second is:20.8w / 60 = 3472# memory expects its internal attributes to be referenced to it8A MB may have many objects in heap memory, large and small, with an object behind each of its reference variables. Shenzhen-shanghai is a popular route, so the message will be large. Unpopular routes, such as Hailaer-Yinchuan, may have only a few routes, but the message may only be hundreds of KB, but the probability of popular routes being requested is higher. In summary: We count each packet size as 5MB5 mb = 550MB that means: the new generation will have 550MB of space taken up every second. Note: This system is different from other systems because there are only two core interfaces for this core service: List query, specified query but other systems, such as the order system may have a lot of interfaces, in addition to the core order query interface, as well as create order, order, check inventory and other interfaces generally the core interface logic will be a little more complex, the size of the object to occupy a little bigger, We use most of the memory that interface for the analysis Because of the worst situation is all the traffic to the core interface The specified query is relative to list query frequency will be lower, just take a list of the query as all interface to a system as a reference Because the interface of an average 5 MB is also quite big, And of course, just as a point of reference, the QPS of the cluster divided by the number of machines, Estimate the QPS per minute for each interface calculate the actual memory occupied by all objects in each interface QPS is not very high, can use the memory size per minute for reference, can use the concurrency is very high, can use the size of the memory occupied per second for reference. Part of the data is just for reference. In fact, the peak flow of this system is more than 100 million levels, and the machine is not only 8C, 16G, but also used for case analysis is OKCopy the code

Second, concurrency index analysis

1. Throughput

? Before learning about QPS, TPS, RT and concurrency, we should first clarify what the throughput of a system actually means. Generally speaking, system throughput refers to the system’s stress resistance and load capacity, which represents the maximum number of user visits a system can withstand per second.

The throughput of a system is usually determined by QPS (TPS) and the number of concurrent transactions. Each system has a relative limit value for these two values. Once one of these values reaches the maximum value, the throughput of the system will not increase.

System throughput is simply the number of requests per second

2, QPS

Queries Per Second refers to the number of Queries that can be responded to Per Second. Note that Queries refer to the number of successful requests sent by users to the server.

QPS = Number of requests per second

3, TPS

Transactions Per Second: Number of Transactions processed Per Second. A transaction is the process by which a client sends a request to the server and the server responds. The client starts the timer when it sends a request and ends it when it receives a response from the server to count the time used and the number of transactions completed.

For a single interface, TPS can be regarded as equivalent to QPS. For example, accessing a page /index.html is a TPS, while accessing a page /index.html may request three servers, such as CSS, JS and INDEX interfaces, resulting in three QPS.Copy the code

TPS = number of transactions per second

The differences between QPS and TPS are as follows:1Tps refers to the number of transactions processed per second, including the three processes that the user requests the server and the server's own internal processing server returns to the user. N of these three processes can be completed per second, Tps is also N.2Qps is basically similar to Tps, but the difference is that a Tps is formed for one visit to a page; However, a page request may result in multiple requests to the server, and the server can count these requests into the "Qps". Example: For example, accessing a page requests the server3Times, one play, produces a "T", produces3A big eater can eat in a second10A bun, a girl0.1Seconds can eat1A baozi, so are they the same? The answer is no, because the girl can't eat it in a second10It's a bun. She'll probably eat it for a long time. At this point, the big eater is TPS, and the girl is QPS. It's similar, but it's different. TPS=QPS (number of requests per second), so for the most part, these two concepts should not be entangled! For example, I need to make A query, but this query requires calling service A and service B, while calling service B requires2So in this case, if I successfully query this scenario as A transaction, I request A transaction per second is 1tps, of course for system A is 1tps=1qps. But for system B, it is 2qps, because it is called twice (if we only look at service B, 2qps=2tps if each request is treated as a transaction, it can be equivalent), so we only pay attention to the difference in dimension, most of the time, we do not have to distinguish deliberately, after all, I can say that my process is 1tps, The double pressure of system B is 2tps (it is important to pay attention to such flow amplification service in the process of pressure measurement, because it is very likely that the service in front can resist, and the service in the back can not support), so there is no problem at all.Copy the code

? For a single interface request, QPS = TPS, but observe that the dimensions of this variable are different. QPS = TPS makes no sense if there are no remote calls to the service from the beginning of the request to the end of the request, but if there are multiple remote calls, QPS = TPS is fine for one server, but QPS > TPS for the entire request link

4, RT

? The abbreviation of Response Time is simply understood as the Time interval from input to output of the system. Broadly speaking, it represents the Time difference from the client initiating a request to the server receiving the request and responding to all data. The average response time is generally taken.

? For RT, the client and the server are very different, because requests from the client to the server, need to go through the WAN, so the client RT is often much larger than the server RT, and the client RT often determines the real experience of the user, the server RT is often a key factor in evaluating the quality of our system.

In the development process, we must have faced a lot of configuration problems in the number of threads, this kind of problem is often confusing, often give a head to the number of threads pool, but this may be unreliable, too small will lead to a very large increase in RT requests, too large RT will also increase. Therefore, it is difficult to evaluate the optimal number of threads.Copy the code

Concurrency:

In short, the number of requests/transactions the system can process simultaneously.

Calculation method:

QPS= Number of concurrent requests /RT or QPS*RT

? Here’s an example:

? Let’s assume that every employee in the company needs to go to the bathroom within an hour from 9 am to 10 am every day. There are 3,600 employees in the company, and the average time for each employee to go to the bathroom is 10 minutes. Let’s calculate.

QPS = 3600/60*60 1

RT is 10 times 60, 600 seconds

Number of concurrent requests = 1 x 600 600

This means that to get the best possible experience, the company needs 600 potholes to accommodate staff, otherwise there will be queues for the toilet.

As the number of requests increases, there are a lot of context switches, GC, and lock changes. Higher QPS, more objects generated, more frequent GC, CPU time and utilization affected, especially in serial mode, lock spin, adaptive, bias, and so on also become factors affecting PAR. In conclusion, in order to achieve the best performance, we need to constantly conduct performance tests, adjust the size of small cities, and find the most appropriate parameters to achieve the purpose of improving performance.Copy the code

Bottleneck analysis of pressure measurement report

? When the service is online, under the impact of business pressure, it will be found that the program runs very slowly, or is down, inexplicable various problems, only some brainless expansion, expansion can really solve the problem?

It may solve the problem, but at the same time, it may bring some other problems. Therefore, before the project goes online, there must be a performance stress test step, so as to find some problems of the service, repair and optimize the service in advance.

1. Application deployment

1.1. Service packaging

Project packaging: You can use IDEA to package and upload the package directly, or you can use Maven to package the package directly on the GitLab server, or you can use Jenkins to package the package. Now we use MAVEN of IDEA to package the package first.

<build> <plugins> <-- this dependency must be present, otherwise the project's dependencies cannot be called into the project: <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> <! Compile the project, then package it, Plugins </groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
Copy the code

Idea maven package:

Note: When packaging the service, you must pay attention to the supporting IP address of the service. Because the service, mysql and Redis are on the same server at this time, so the connection access address is set to localhost.

server:
  port: 9000
spring:
  application:
    name: sugo-seckill-web
  datasource:
    url: JDBC: mysql: / / 127.0.0.1:3306 / shop? useUnicode=true&characterEncoding=utf8&autoReconnect=true&allowMultiQueries=true
# url: JDBC: mysql: / / 47.113.81.149:3306 / shop? useUnicode=true&characterEncoding=utf8&autoReconnect=true&allowMultiQueries=true
    username: root
    password: root
    driver-class-name: com.mysql.jdbc.Driver
    druid:
      Configure the initial size, min, Max
      initial-size: 1
      min-idle: 5
      max-active: 5
      max-wait: 20000
      time-between-eviction-runs-millis: 600000
      Set the maximum idle time for a connection in the pool in milliseconds
      min-evictable-idle-time-millis: 300000
      Set whether to check whether a connection is valid when it is fetched from the pool. If false, the check is not performed
      test-on-borrow: true
      Set whether to check whether a connection is valid when returning a connection to the pool. If false, the check is not performed
      test-on-return: true
      # set connection check connection effectiveness was obtained from the connection pool, true, if the connection is idle for more than minEvictableIdleTimeMillis check, otherwise do not check; If false, the check is not performed
      test-while-idle: true
      Check whether the connection is valid. Ping () is used preferentially if the database Driver supports ping(), otherwise validationQuery is used. (Currently, the Oracle JDBC Driver does not support ping.)
      validation-query: select 1 from dual
      keep-alive: true
      remove-abandoned: true
      remove-abandoned-timeout: 80
      log-abandoned: true
      PSCache: PSCache: PSCache: PSCache: PSCache: PSCache: PSCache: PSCache: PSCache
      pool-prepared-statements: true
      max-pool-prepared-statement-per-connection-size: 20
      Configure how often to launch DestroyThread to test connections in the connection pool once, in milliseconds.
      # When testing:
      # 1. If the connection is idle and outside more than minIdle connection, if free time more than minEvictableIdleTimeMillis set the value of the physical closed directly.
      #2. No processing within minIdle.
  redis:
    host: 127.0. 01.
    port: 6379
mybatis:
  type-aliases-package: com.supergo.pojo
mapper:
  not-empty: false
  identity: mysql
Copy the code

1.2. Package and upload

Start command: java-jar jshop-web-1.0-snapshot.jar

Note: When deploying the server, due to the different server environment, it is often necessary to modify the configuration file of the service, recompile and package, and the server IP address is required. The IP address of the local development environment is different from that of the online IP, so it is very troublesome to modify these configurations every time during deployment. Therefore, the service should be deployed with the ability to mount a plug-in profile.

The name of the configuration file must be application.yaml, or application.properties
java -jar xxx.jar --spring.config.addition-location=/usr/local/src/application.yaml

Copy the code

Plugins use local connection addresses:

spring:
  application:
    name: sugo-seckill-web
  datasource:
    url: JDBC: mysql: / / 127.0.0.1:3306 / shop? useUnicode=true&characterEncoding=utf8&autoReconnect=true&allowMultiQueries=true
    username: root
    password: root
    driver-class-name: com.mysql.jdbc.Driver
    druid:
      Configure the initial size, min, Max
      initial-size: 1
      min-idle: 5
      max-active: 5
      max-wait: 20000
      time-between-eviction-runs-millis: 600000
      Set the maximum idle time for a connection in the pool in milliseconds
      min-evictable-idle-time-millis: 300000
      Set whether to check whether a connection is valid when it is fetched from the pool. If false, the check is not performed
      test-on-borrow: true
      Set whether to check whether a connection is valid when returning a connection to the pool. If false, the check is not performed
      test-on-return: true
      # set connection check connection effectiveness was obtained from the connection pool, true, if the connection is idle for more than minEvictableIdleTimeMillis check, otherwise do not check; If false, the check is not performed
      test-while-idle: true
      Check whether the connection is valid. Ping () is used preferentially if the database Driver supports ping(), otherwise validationQuery is used. (Currently, the Oracle JDBC Driver does not support ping.)
      validation-query: select 1 from dual
      keep-alive: true
      remove-abandoned: true
      remove-abandoned-timeout: 80
      log-abandoned: true
      PSCache: PSCache: PSCache: PSCache: PSCache: PSCache: PSCache: PSCache: PSCache
      pool-prepared-statements: true
      max-pool-prepared-statement-per-connection-size: 20
      Configure how often to launch DestroyThread to test connections in the connection pool once, in milliseconds.
      # When testing:
      # 1. If the connection is idle and outside more than minIdle connection, if free time more than minEvictableIdleTimeMillis set the value of the physical closed directly.
      #2. No processing within minIdle.
  redis:
    host: 127.0. 01.
    port: 6379
Copy the code

1.3. Startup Script

Create a shell script file like deploy.sh that performs the back-end startup of the Java program

# Use nohup to start the Java process in the background as a process modelNohup java-xMS500m -XMX500m -xx :NewSize=300m -xx :MaxNewSize=300m -jar jshop-web-1.0-snapshot.jar --spring.config.addition-location=application.yaml > log.log 2>&1 &# authorization
chmod 777 deploy.sh

# query process
jps 
jps -l  
Copy the code

Browser interface test access, found that the service has been started successfully:

2. Start the pressure test

2.1. Preparation for pressure measurement

Pressure measurement objectives:

Thread gradient: 500, 1000,1500,2000,2500,3000 threads, that is, simulate these number of concurrent users;

Time setting: set the ramp-up period(inseconds) to 1(i.e., 1s, 5s, and 10s enable 500, 1000,1500,2000,2500,3000 concurrent access)

Number of cycles: 50

1) Set pressure test request

2.2. Add listeners

Aggregate report: Add an aggregate report

View result Tree: Adds a view result tree

TPS statistical analysis: transactions per second

Server performance monitoring: CPU, memory, and I/O

Note: For jMeter CPU monitoring, only single-core cpus are monitored, which is not of much reference value. A real test can use the TOP command to observe CPU usage.

3. Parameter principle

We set the number of threads n = 5, the number of loops a = 1000, request www.google.com, and get the aggregated report as shown in the figure:

)

The average request time to get a Google homepage is about T = 0.2 seconds

Here, we set the ramp-up Period to T = 10 seconds for analysis purposes (the actual reasonable time will be explained later).

Still n = 5, S = (t-t /n) = 8, that is to say, from the start of the first thread to the 8th second, the last thread starts, if the first thread is not closed when the last thread starts, it needs to satisfy a· T > S, S = 8, T = 0.2, You get a is greater than 40.

Number of threads: N =5 Number of loops: A = 1000 T= 0.2s ramp-up Period t =10s s = (t-T /n) = 8 # cycle times * average response time A * t > s ==> A > S/t = 40 ramp-up Period: [1] Decide how long to start all threads. If you use 10 threads and the ramp-up period is 100 seconds, JMeter takes 100 seconds to get all 10 threads up and running. Each thread starts 10 seconds (100/10) after the previous one. Ramp-up needs to be long enough to avoid a large workload when starting the test, and small enough that the last thread starts before the first one completes. Generally set ramp-up= number of threads to start and adjust up and down to the desired. [2] is used to tell JMeter how long it takes to create all the threads. The default value is 0. If no ramp-up period is specified, that is, if the ramp-up period is zero, JMeter will immediately establish all threads. Assuming that the ramp-up period is set to T seconds and the total number of threads is set to N, JMeter will create a thread every T/N seconds. [3] Ramp-up Period(in-seconds) indicates the execution interval. 0 indicates concurrent executionCopy the code

OK, since the number of loops is greater than 40, we can set the loop to 100, so the single thread running time is R = a·t = 20 seconds, that is, the first thread will stop at 20 seconds, the theoretical running time of the whole test is S + R = (1-1/n)· t + a·t = 28 seconds

Let’s use a graph to visually see how each thread works

It can be seen from the figure that from the 8th second to the 20th second, five threads are running at the same time, which is the real simulation of five concurrent users

Having said all that, what exactly is our purpose? How to set the number of threads, ramp-up periods, and cycles. I don’t want to talk about the number of threads that can be ramped Up in a Ramp Period.

4. Performance parameter analysis

4.1 TPS throughput

Aggregate report: TPS 3039, but you can’t see what the peak of TPS is here, it needs to be improved

Average: Average response time Median: median response time50% Request response time90% percentile (90% Line) :90% of the request response time, meaning that90The % request is <=1765ms returned, in addition10% of requests are returned by 1765ms or greater.95% percentile (95% Line) :95% of the request response time,95% of requests are returned within 1920ms99% percentile (99% Line) :99% request response time minimum (min) : the minimum time that a request returns, one of which takes the least time maximum (Max) : the maximum time that a request returns, one of which takes the most time error: Percentage of errors that occur, error rate = number of incorrect requests/total requests throughput TPS (throughout) : Received KB/ SEC ---- Amount of data Received from the server per second Sent KB/ SEC ---- Number of requests Sent from the client per secondCopy the code

Monitoring graph using Tps:

It can be seen that TPS has the maximum TPS capacity at 1s+, and TPS has a smooth transition at other times.

Changes in system performance as concurrency pressures increase and time increases. Under normal circumstances, the average sampling response time curve should be smooth and roughly parallel to the graph’s lower boundary.

4.2 Performance curve

Possible performance problems:

① The average value jumped in the initial stage, and then gradually stabilized

First, the system has performance defects in the initial stage and needs further optimization, such as slow database query

Second, the system has a caching mechanism, and the performance test data does not change during the test, so the response time of the same data in the initial stage is certainly slow; This is a performance test data preparation problem, not a performance defect, need to be adjusted before the test

The third is the inherent phenomenon caused by the system architecture design. For example, after the system receives the first request, the link between the application server and the database is established, and the connection will not be released in the following period of time.

② As the average continues to increase, the picture becomes steeper and steeper

First, memory leaks may occur. In this case, you can use common methods such as monitoring system logs and application server status to locate the fault.

③ The average suddenly jumps during the performance test and then returns to normal

First, there may be system performance defects

Second, it may be caused by the instability of the test environment (check the CPU usage or memory usage of the application server or check whether the network in the test environment is congested).

4. Service optimization

1. The number of threads increases

# When the number of threads on the server reaches 245, the number of threads on the server cannot be reached
# pstree Query the number of threads
jps -l
# query all threads
pstree -p pid   
# count threads
pstree -p pid | wc -l
# Start the pressure test, count the number of threads again, check whether the number of threads has increased, whether there is more space to increase

Copy the code

2. Embedded configuration

The Springboot service uses an embedded Tomcat server to start the service, so the Tomcat configuration uses the default configuration. We need to optimize the Tomcat configuration appropriately to improve Tomcat performance.

Of course, the configuration of inline Tomcat inline thread pool is also relatively small. We can rewrite the relevant configuration of Tomcat through the plug-in configuration file, and then restart the server to test. Modify the configuration as follows:

The maxConnections, maxThreads, and acceptCount configurations of Tomcat indicate the maximum number of connections, the maximum number of threads, and the maximum number of waits, respectively. You can change these values using the application.yml configuration file as follows:

server:
  tomcat:
    uri-encoding: UTF-8
    # Maximum number of working threads, default 200, 4-core 8G memory, thread count experience value 800
    The operating system has overhead for scheduling switches between threads, so more is not better.
    max-threads: 1000
    The queue length is 100 by default
   accept-count: 1000
   max-connections: 20000
    # Minimum number of idle working threads (default: 10)
   min-spare-threads: 100
Copy the code

1) accept-count: maximum number of waits

? The official documentation states the maximum length of the queue that can receive connection requests when all request processing threads are in use. When the queue is full, any connection requests will be rejected. The default accept-count value is 100. The acceptCount is the maximum number of acceptances that tomcat can accept. The default acceptCount is 100. If the queue is full, tomcat will refuse new requests.

MaxThreads: maximum number of threads

Each time an HTTP request arrives at a Web service, Tomcat creates a thread to process the request, so the maximum number of threads determines how many requests the Web service container can process simultaneously. MaxThreads defaults to 200, which is definitely recommended. However, there is a cost to adding threads. More threads not only incur more thread context switching costs, but also mean more memory consumption. By default, a 1M thread stack is allocated when new threads are created in the JVM, so more threads require more memory. The empirical value of the number of threads is: 1 core 2 gb memory is 200, and the empirical value of the number of threads is 200. 4 core 8G memory, thread count experience value 800.

3) maxConnections: the maximum number of connections

The official documentation reads:

This parameter is the maximum number of connections tomcat can accept at any one time. For Java’s blocking BIO, the default is the value of MaxThreads; If a custom Executor Executor is used in BIO mode, the default value will be the value of MaxThreads in the Executor. For Java’s new NIO schema, the maxConnections default is 10000. For APR/ Native IO mode on Windows, maxConnections defaults to 8192. This is for performance reasons, and if the configured value is not a multiple of 1024, the actual maxConnections value is reduced to a maximum multiple of 1024. If the value is set to -1, maxConnections is disabled, indicating that the number of connections to the Tomcat container is not limited. The relationship between maxConnections and accept-count is as follows: When the number of maxConnections reaches the maximum value, the system continues to receive connections but does not exceed the acceptCount value.

3. Problem solving

Java.net.BindException: Address already in use: connect java.net.BindException: Address already in use: connect java.net.BindException: Address already in use: connect

Windows provides ports 1024-5000 for TCP/IP connections, and it takes 4 minutes to recycle them. As a result, we run a large number of requests in a short time and the port is full.

Solution (operated on jMeter server) :

1. Run the regedit command to open the registry.

2. Right-click HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters;

3. Add a new DWORD named MaxUserPort.

4. Double-click MaxUserPort, input 65534 value data, and select decimal base.

5. Restart the machine after completing the above operations. The problem is solved and the test is effective.

4. Performance comparison

Non-optimized test performance comparison table:

Number of concurrent requests / 5s	Number of samples	Average response time (MS)	Throughput (TPS)	Error rate	KB/sec
200	10000	7	1849	0	791
300	15000	14	2654	0	1135
500	25000	71	2925	0	1251
700	35000	141	2865	0	1225
1000	50000	244	2891	0	1236
1500	75000	418	2910	0	1224
2000	100000	553	2920	0	1249
3000	150000	889	2533	0.05	1086
3200
3500
4000
4500
5000

Comparison table of optimized performance:

# Optimized parameter configuration
server:
  tomcat:
    accept-count: 800
    max-connections: 20000
    max-threads: 800
    min-spare-threads: 100

# After optimization, the performance does not increase but decrease
	# 1. Since querying data from the database based on the primary key is not time-consuming, this is a non-time-consuming operation. Increasing the number of threads actually increases the CPU switching time
	There is also a significant performance drain due to the fact that the query data from the database is encapsulated into an object, which is then garbage collected (one object per request, and then garbage collected).
	Summary: If you want to see a performance improvement after tuning server parameters, you must increase the service time of the method. If the time consumption is longer, you can see the effect of tuning parameters. Otherwise, it is impossible to see the effect.
	
	# The simple optimization of the service has already seen the effect: let the thread sleep for 1s, then in the test for the optimization, and after the optimization results, found that the effect of TPS after the optimization is significantly improved.
Copy the code

Performance test comparison:

Number of concurrent requests / 5s	Number of samples	Average response time (MS)	Throughput (TPS)	Error rate	KB/sec
200	10000	11	1836	0	785
300	15000	17	2486	0	1063
500	25000	85	2693	0	1152
700	35000	151	2769	0	1184
1000	50000	260	2736	0	1171
1500	75000	445	2766	0	1183
2000	100000	582	2813	0	1203
3000	150000	937	2794	0	1195
3200	160000	1147	2429	0	1039
3500	175000	1150	2796	0.05	1198
4000	200000	1292	2852	0	1220
4500	225000	1428	2825	0	1208
5000	250000	1581	2867	0.03	1228

5, keepalive

The long connection consumes a lot of resources. If the connection cannot be released in time, the TPS of the system cannot be improved. Therefore, we need to transform the Web service to improve the performance of the long connection of the Web service.

package com.sugo.seckill.web.config;

/ / when when there is no TomcatEmbeddedServletContainerFactory this bean in the Spring container, will load the beans into the Spring container
@Component
public class WebServerConfig implements WebServerFactoryCustomizer<ConfigurableWebServerFactory> {
    @Override
    public void customize(ConfigurableWebServerFactory configurableWebServerFactory) {
            // Customize our Tomcat Connector using the interface provided to us by the corresponding factory class
        ((TomcatServletWebServerFactory)configurableWebServerFactory).addConnectorCustomizers(new TomcatConnectorCustomizer() {
            @Override
            public void customize(Connector connector) {
                Http11NioProtocol protocol = (Http11NioProtocol) connector.getProtocolHandler();

                // Customize keepaliveTimeout. If no request is received within 30 seconds, the server automatically disconnects the Keepalive connection
                protocol.setKeepAliveTimeout(30000);
                // When the client sends more than 10000 requests, the Keepalive connection is automatically disconnected
                protocol.setMaxKeepAliveRequests(10000); }}); }}// Monitor the number of ESTABLISHED, TIME_WAIT threads
netstat -n | awk '/^tcp/ {++y[$NF]} END {for(w in y) print w, y[w]}'

// Check network connections:
netstat -an |wc -l
netstat -an |grep xx |wc -l       // Check the number of connections for a particular PID
netstat -an |grep TIME_WAIT |wc -l   // Check the number of connections waiting in time_WAIT state
netstat -an |grep ESTABLISHED |wc -l  // Check the number of established stable connections
    
TIME_WAIT indicates active shutdown, and CLOSE_WAIT indicates passive shutdown
Copy the code

TCP state transfer Main Points TCP provides that the two parties on the network must shake hands four times to disconnect an established connection. If one of the steps is missing, the connection is in suspended state and the resources occupied by the connection are not released. The network server program manages a large number of connections simultaneously, so it is important to ensure that useless connections are completely disconnected, or a large number of dead connections can waste a lot of server resources. Of the many TCP states, the most noteworthy are CLOSE_WAIT and TIME_WAIT.

1. LISTENING Status After the FTP service is started, it is in the LISTENING state.

ESTABLISHED Means to establish a connection. Indicates that two machines are communicating.

* 3, CLOSE_WAIT *

* The other party actively closes the connection or the connection is interrupted due to network exception *. In this case, our state will change to CLOSE_WAIT. In this case, we will call close() to close the connection correctly

* 4, TIME_WAIT *

*** We actively call close() to disconnect, and the state changes to TIME_WAIT* after receiving the other party’s confirmation. **TCP ensures that the old connection state does not affect the new connection by specifying that the TIME_WAIT state lasts 2MSL(i.e., twice the segmented maximum lifetime). Resources occupied by connections in TIME_WAIT state are not released by the kernel, so as a server, if possible, do not actively disconnect to reduce the waste of resources caused by TIME_WAIT state.

One way to avoid wasting TIME_WAIT resources is to disable the LINGER option on the socket. However, this operation is not recommended by TCP and may cause errors in some cases.

The state of the socket

1.1 Status Description

CLOSED Not using this socket [netstat cannot show closed state]

LISTEN A socket is listening for a connection [after calling LISTEN]

SYN_SENT Socket trying to actively establish connection [not received ACK after sending SYN]

SYN_RECEIVED Initial syncing of a connection [received a SYN from the other party, but did not receive an ACK for your own SYN]

ESTABLISHED Connection established

CLOSE_WAIT Remote socket closed: waiting to close this socket [passive closed party receives FIN]

FIN_WAIT_1 Socket closed, closing connection [FIN sent, neither ACK nor FIN received]

CLOSING Socket closed, remote socket closing, suspend close confirmation [FIN received from passive side in FIN_WAIT_1 state]

LAST_ACK Remote socket closed, waiting for local socket closure confirmation [passive sends FIN in CLOSE_WAIT state]

FIN_WAIT_2 Socket closed, waiting for remote socket closure [RECEIVED ACK for FIN in FIN_WAIT_1 state]

TIME_WAIT The socket has been CLOSED and is waiting for the remote socket to be CLOSED.

CLOSED	Not using this socket [netstat cannot show closed state]
LISTEN	A socket is listening for a connection [after calling LISTEN]
SYN_SENT	Socket trying to actively establish connection [not received ACK after sending SYN]
SYN_RECEIVED	Initial syncing of a connection [received a SYN from the other party, but did not receive an ACK for your own SYN]
ESTABLISHED	Connection established
CLOSE_WAIT	Remote socket closed: waiting to close this socket [passive closed party receives FIN]
FIN_WAIT_1	Socket closed, closing connection [FIN sent, neither ACK nor FIN received]
CLOSING	Socket closed, remote socket closing, suspend close confirmation [FIN received from passive side in FIN_WAIT_1 state]
LAST_ACK	Remote socket closed, waiting for local socket closure confirmation [passive sends FIN in CLOSE_WAIT state]
FIN_WAIT_2	Socket closed, waiting for remote socket closure [RECEIVED ACK for FIN in FIN_WAIT_1 state]
TIME_WAIT	The socket has been CLOSED and is waiting for the remote socket to be CLOSED.

Status transition diagram :(TCP connection status transition diagram)

Java NIO servers only need to start a dedicated thread to handle all IO events. How does this communication model work? Ha ha, let’s explore its mystery together. Java NIO uses two-way channels for data transmission, rather than one-way streams, where events of interest can be registered. There are four types of events:

The event name	The corresponding value
The server receives the client connection event	SelectionKey.OP_ACCEPT(16)
The client connects to the server	SelectionKey.OP_CONNECT(8)
Read the event	SelectionKey.OP_READ(1)
Write the event	SelectionKey.OP_WRITE(4)

The server and client each maintain a channel management object, called a selector, that can detect events on one or more channels. For example, if a server registers a read event on its selector, and at some point the client sends some data to the server, blocking I/O will call the read() method to block the read, and the NIO server will add a read event to the selector. The processing thread on the server side will access the selector in a polling manner, and if an event of interest arrives while accessing the selector, it will process the event, and if none arrives, the processing thread will block until the event of interest arrives. Here is a diagram of the Java NIO communication model as I understand it:

The Selector. It’s one of the most important parts of NIO, because the Selector is used to poll each registered Channel, and when a registered event is found for a Channel, it gets the event and processes it.

The process of sending data from a client to a server, which then receives the data. When clients send data, they must first store the data in Buffer and then write the contents of Buffer to the channel. The server side must read the data into the Buffer through a Channel, and then read the data out of the Buffer for processing.

Query springBoot built-in Tomcat IO type used, source simple analysis:

6, nio2

Rewrite the request using NIO2’s HTTP protocol to see if the server performance improves

# Use niO2 # embed tomcat to configure spring.server.port=8095# and CPU for spring. Server. AcceptorThreadCount =4
spring.server.minSpareThreads=50
spring.server.maxSpareThreads=50
spring.server.maxThreads=1000
spring.server.maxConnections=10000
#10Second timeout spring. The server. The connectionTimeout =10000
spring.server.protocol=org.apache.coyote.http11.Http11Nio2Protocol
spring.server.redirectPort=443Spring.server.com pression = size request on # file spring. The server MaxFileSize = 300 MB spring. The server MaxRequestSize = 500 MB@Slf4j
@Component
class AppTomcatConnectorCustomizer implements WebServerFactoryCustomizer<ConfigurableServletWebServerFactory> {
 
    @Override
    public void customize(ConfigurableServletWebServerFactory factory) {
        ((TomcatServletWebServerFactory) factory).setProtocol("org.apache.coyote.http11.Http11Nio2Protocol");
        ((TomcatServletWebServerFactory) factory).addConnectorCustomizers(new TomcatConnectorCustomizer() {
            @Override
            public void customize(Connector connector) {
                ProtocolHandler protocol = connector.getProtocolHandler();
 
                log.info("Tomcat({}) -- MaxConnection:{}; MaxThreads:{}; MinSpareThreads:{}".//
                        protocol.getClass().getName(), //((AbstractHttp11Protocol<? >) protocol).getMaxConnections(),//((AbstractHttp11Protocol<? >) protocol).getMaxThreads(),//((AbstractHttp11Protocol<? >) protocol).getMinSpareThreads()); }}); }}Copy the code

7, undertow

# Undertow is a flexible, high-performance Web server that supports blocking and non-blocking IO. Because Undertow is developed in the Java language, it can be embedded directly into Java projects. At the same time, Undertow fully supports servlets and Web sockets and performs well in high concurrency situations


# set the number of IO threads, which perform non-blocking tasks. They are responsible for multiple connections. By default, one thread per CPU core
# Do not set the value too large, if it is too large, the startup project will report an error: too many open files

server.undertow.io-threads=16

When performing a servlet-like request blocking IO operation, Undertow fetches threads from this thread pool
The default value is the number of I/O threads *8
server.undertow.worker-threads=256

The following configuration affects buffers, which are used for IO operations on server connections, somewhat similar to Netty's pooled memory management
The smaller the size of each buffer, the more fully used the space, do not set too large, so as not to affect other applications, appropriate
server.undertow.buffer-size=1024

Buffer-size * buffers-per-region
server.undertow.buffers-per-region=1024

Direct memory allocated or not (NIO directly allocated out-of-heap memory)
server.undertow.direct-buffers=true

# Configuration descriptionIn the case of undertow, the official default configuration is CPU core *8, such as 8-core CPU, the actual number of working threads is 8*8=64. For high concurrency scenarios, a machine with 8-core CPU will generally have 32G or more memory, even if it runs up to 64 threads. The resource footprint is far from the full utilization of the machine's performance. , of course, the official is also suggested that the number of worker threads configuration, depends on the load of the machine, you actually is also associated with your specific business, we now business scenario, 64 threads run full, CPU utilization just teens, CPU, memory is not used, no request, undertow directly blocking queue, It cannot be processed normally, wasting resources and causing service interruption. Therefore, as a practical matter, undertow's official default configuration should never be adopted. For our online business, most of each API interface business takes less than 10ms, and a few take 30-50ms, so a single interface consumes little resources. Modify undertow to set worker-threads=256, so that a single node can carry 256 concurrent tasks, and the rest is to dynamically expand the number of nodes according to actual service conditions.Copy the code

5. Common performance analysis methods

? Java language is the most widely used language on the Internet at present. As a Java programmer, when the business is relatively stable, I usually work in addition to coding, most of the time (70%~80%) IS used to check emergent or periodic online problems.

Failures/problems with Java online services are almost inevitable due to business application bugs (themselves or the introduction of third-party libraries), environmental issues, hardware issues, etc. For example, common symptoms include partial request timeouts, noticeable stuttering by users, and so on.

As soon as possible, the online problem is very obvious from the system appearance, but it is still difficult to investigate the cause of its occurrence, so it has caused a lot of troubles to the students of development testing or operation and maintenance.

Troubleshooting and locating online problems has certain skills or rules of experience. The deeper an investigator knows about the business system, the easier it will be to locate the problems.

In any case, to master the idea of troubleshooting problems on the Java service line and be able to skillfully troubleshoot problems with common tools/commands/platforms is every Java program must master practical skills.

1. Common Online problems with Java services

The online problems of all Java services boil down to four aspects in terms of system representation: CPU, memory, disk, and network. Such as sudden spikes in CPU usage, memory overflow (leak), full disk, abnormal network traffic, FullGC, etc.

Based on these phenomena, we can classify online problems into two categories: system exceptions and business service exceptions.

1.1. System exceptions

Common system exceptions include high CPU usage, high CPU context switchover frequency, full disk, frequent disk I/O, abnormal network traffic (too many connections), and low available memory for a long time (oom killer).

You can use tools such as TOP (CPU), free(memory), DF (disk), dstat(network traffic), pSTACK, vmstat, and Strace (underlying system call) to obtain system exception data.

In addition, if the system and application are checked and no more stupid cause is found, the problem may be caused by external infrastructure such as the IAAS platform itself.

1.2. Service services are abnormal

Common service exceptions include high PV volume, abnormal service invocation time, thread deadlocks, multithreading concurrency, frequent Full GC, and scanning for security attacks.

2. Problem location

We generally adopt the elimination method, from external investigation to internal investigation to locate online service problems.

First we need to rule out other processes (except the main process) may cause problems;
Then eliminate the business application may cause the fault problem;
You can consider whether the fault is caused by the carrier or cloud service provider.

2.1 System exception troubleshooting process

Troubleshooting process, troubleshooting methods in the Linux system, troubleshooting process.

2.2 Business application troubleshooting process

3. Performance analysis tools commonly used in Linux

The performance analysis tools commonly used in Linux include TOP (CPU), free(memory), df(disk), dstat(network traffic), pSTACK, vmstat, and strace(low-level system call).

3.1, CPU,

CPU is an important monitoring indicator of the system and can analyze the overall running status of the system. Monitoring indicators include running queues, CPU usage, and context switching.

The top command is a common CPU performance analysis tool in Linux. It displays the resource usage of each process in the system in real time and is often used to analyze server performance.

The top command displays the CPU usage of each process in ascending order. Load Average displays the system Load averages of the last 1 minute, 5 minutes, and 15 minutes. The values in the preceding figure are 2.46, 1.96, and 1.99, respectively. We tend to focus on the process with the highest CPU usage, which is normally our main application process. Line 7 below: Monitoring the status of each process# Function description of each componentPID: indicates the id of a process. USER: indicates the process owner. PR: indicates the priority of a process. A negative value indicates the high priority, and a positive value indicates the low priority. VIRT: Total virtual memory used by a process, in KB. VIRT=SWAP+RES RES: physical memory size used by a process that has not been swapped out (unit: KB). RES=CODE+DATA SHR: indicates the shared memory size (unit: KB). S: indicates the process status. D= Uninterruptable sleep R= Running S= sleep T= Tracing/stopping Z= zombie process %CPU: percentage of CPU usage since the last update %MEM: percentage of physical memory used by the process TIME+ : Total CPU time used by a process, expressed in 1/100 second COMMAND: indicates the name of a processCopy the code

3.2, memory,

Memory is an important reference for troubleshooting online problems, and memory problems are often the insight factor for higher CPU utilization.

System memory: Free is to display the current memory usage, and -m means M bytes to display the content.

# free Query current memory usage
free -m

# Parameter descriptionTotal Total memory: 7821M Used Number of used memory: 713M Free Number of free memory: 3107M shared The memory is not used and is always 0Copy the code

2.3, disk

Use df to check whether the disk is occupied
df -h

# Check disk usage in a specific directory
du -m /path
Copy the code

2.4, network

The dstat command integrates vmstat, iostat, netstat lsof, and other tasks.

1) vmstat instruction:

# [Sampling 5 times within 5 seconds]
vmstat 5  5 

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   incs us sy id wa st 1 0 0 519512 199436 4079816 0 0 1 7 12 1 0 0 99 0 0 0 0 0 519232 199436 4079816 0 0 0 10 1920 3413 0 0 99 00 00 519380 199436 4079816 00 0 10 1854 3348 00 100 00 Procs: r: Number of processes in the running queue B: number of processes waiting for I/OS Memory: SWPD: used virtual Memory size Free: available Memory size buff: used Memory for buffering Cache: Used Memory for caching Swap: si: Used Memory for caching So: size of memory written to the swap area per second IO: (the block size of current Linux versions is 1024bytes) bi: number of blocks read per second bo: number of blocks written per second system:in: Number of interrupts per second, including clock interrupts 【 Interrupt 】 cs: number of context switches per second Sy: indicates the system time. Id: indicates the idle time (including the I/O wait time), which is the idle time of the CPU. Expressed as a percentage. Wa: I/O waiting time Note:# 1. If r is often greater than 4 and ID is often less than 40, the CPU is heavily loaded.
 # 2. If bi, BO does not equal 0 for a long time, the memory is insufficient.
 If disk does not always equal 0 and the queue in b is greater than 3, I/O performance is poor.Linux not only has high stability and reliability, but also has good scalability and expansibility. It can be adjusted for different applications and hardware environments to optimize the best performance to meet the needs of current applications. Therefore, it is very important for enterprises to understand system performance analysis tools when maintaining and tuning Linux systems.Display active and inactive memoryVmstat -a 2 5 vmstat -a 2 5# display the number of forks since the system was started.
vmstat -f 

Copy the code

2) lsOF: used to view the file opened by your process, the process opened by the file, and the port opened by the process (TCP, UDP). Retrieve/restore deleted files

# lsof opens a file like this:1. Common files 2. Directories 3. Network file system files 4. Symbolic links 8. Network files (such as NFS file, network socket, Unix domain name socket) 9. There are other types of files, and so onCopy the code

3) dstat -c CPU status -d Disk read/write status -n Network status -L System load status -m System memory status -P System process information -r System I/O status

Yum -y install dstat

4) pstack strace

Sometimes we need to optimize the application to reduce the application response time. Is there an easier way to do this than to do a time complexity analysis of code piece by piece?If you can directly find the function calls that affect the program's runtime, and then analyze and optimize the relevant functions, it is much more efficient than looking at the code aimlessly. Using strace in combination with the PStack tool, you can do this. Strace tracks the underlying system calls used by the program and outputs the time at which the system call was executed and the time spent on each call. The pStack tool outputs the function call stack for the process with the specified PID.Copy the code

4. JVM problem locating tool

The bin directory in the JDK installation directory provides many valuable command-line tools by default. Each gadget is relatively small because it is simply a wrapper around the JDK \lib\tools.jar.

The most common commands for locating faults include JPS (process), jmap (memory), jstack (thread), and jinfo(parameter).

JPS: Queries information about all JAVA processes on the current machine.
Jmap: Outputs the memory of a Java process (e.g., which objects were generated and how many)
Jstack: Prints thread stack information for a Java thread;
Jinfo: Used to view configuration parameters for the JVM.

4.1, the JPS

JPS displays the ids of all processes started by the current user. When a fault or problem is detected online, JPS can be used to quickly locate the corresponding Java process ID

# JPSJPS - l - m - m - l - l parameter is used to output the startup type the full path of the ps - ef | grep tomcatWe can also get the tomcat service process ID quickly.
Copy the code

Of course, we can also use Linux to query the process status command, for example:

4.2, jmap

The map command is a tool that can output all the objects in memory, even the heap in the VM, as binary output to text. Prints out all 'objects' in memory of a Java process (using pid) (e.g., which objects were generated, and how many).# output the current process JVM heap new generation, old age, persistent generation, etc., GC algorithm, etc
jmap -heap pid

# output the size of all objects in the current process memory
jmap -histo:live {pid} | head -n 10  

Output the heap status of current memory in binary file, and then import MAT and other tools
jmap -dump:format=b,file=/usr/local/logs/gc/dump.hprof {pid} 

# jmap(Java Memory Map) is a tool that can output all objects in Memory, even the HEAP in the VM, in binary output to text.
jmap -heap pid
# output the current process JVM heap new generation, old age, persistent generation, etc., GC algorithm, etc
jmap -heap pid



using parallel threads in the new generation.  The new generation uses parallel threading
using thread-local object allocation.   
Concurrent Mark-Sweep GC   ## Synchronous parallel garbage collection

 

Heap Configuration:  The heap configuration is the result of the configuration of JVM parameters.
   MinHeapFreeRatio = 40 Minimum heap usage ratio
   MaxHeapFreeRatio = 70 ## Maximum heap available ratioMaxHeapSize = 2147483648 (2048.0 MB)## Maximum heap sizeNewSize = 268435456 (256.0 MB)## New generation allocates sizeMaxNewSize = 268435456 (256.0 MB)## Maximum size for new generation allocationOldSize = 5439488 (5.1875 MB)## Old age size
   NewRatio         = 2  ## Proportion of new generation
   SurvivorRatio    = 8 ## Ratio of Cenozoic to SuvivorPermSize = 134217728 (128.0 MB)## Perm area permanent generation sizeMaxPermSize = 134217728 (128.0 MB)The maximum allocatable perm range is the permanent generation size

Heap Usage: Actual heap memory usage
New Generation (Eden + 1 Survivor Space):  ## Cenozoic (Eden + Survior (1+2) space)Capacity = 241631232 (230.4375 MB)## Eden area capacity2 = 77776272 (74.17323303222656 MB)## Already used sizeFree = 163854960 (156.26426696777344 MB)## Remaining capacity32.188004570534986%, informs## Usage ratio

Eden Space:  # # Eden areaCapacity = 214827008 (204.875 MB)## Eden area capacity2 = 74442288 (70.99369812011719 MB)## Eden useFree = 140384720 (133.8813018798828 MB)## Current remaining capacity of Eden34.65220164496263%, informs## Eden area usage

From Space: # # survior1 areaCapacity = 26804224 (25.5625 MB)# # survior1 area capacity2 = 3333984 (3.179534912109375 MB)## Surviror1 has been usedFree = 23470240 (22.382965087890625 MB)Remaining capacity of surVIRor112.43827838477995%, informsPercentage of survior1 use

To Space: # # survior2 areaCapacity = 26804224 (25.5625 MB)# # survior2 area capacity2 = 0 (0.0 MB)Use of ## Survior2 zoneFree = 26804224 (25.5625 MB)Survior2 remaining capacity0.0%, informsPercentage of survior2 use

PS Old  Generation: ## Old usageCapacity = 1879048192 (1792.0 MB)## Old age capacity2 = 30847928 (29.41887664794922 MB)## Old age used capacityFree = 1848200264 (1762.5811233520508 MB)## Remaining capacity of old age1.6416783843721663%, informs## Old usage ratio

Perm Generation: ## Permanent generation usageCapacity = 134217728 (128.0 MB)# # perm area capacity2 = 47303016 (45.111671447753906 MB)The used capacity of the perm areaFree = 86914712 (82.8883285522461 MB)The remaining capacity of the perM area35.24349331855774%, informs## Perm usage ratio
Copy the code

Jmap allows you to view information such as memory allocation and usage of JVM processes, GC algorithms used, and so on.

Jmap - histo: live {pid} | head -n 10: Jmap - histo: live {pid} | head -n 10 output all the objects in the current process memory contains the size of the output current process all in-memory object instance number (instances) and size (bytes), if a business object instance number and size of abnormal situation, There may be memory leaks or business design errors. Jmap-dump: jmap-dump :format=b,file=/usr/local/logs/gc/dump.hprof {pid} -dump:formate=b,file= -xx :+Heap Dump On Out Of Memory Error OOM To ensure that the JVM can save and Dump the current Memory image when the application occurs OOM. Of course, if you decide to dump memory manually, the dump operation takes up a certain amount of CPU time, memory resources, disk resources, etc., so there are some negative effects. In addition, dump files may be large. Generally, you can use the zip command to compress files to reduce bandwidth overhead during file download. After the dump file is downloaded, you can back up the dump file to a specified location or delete the dump file to release disk space.Copy the code

4.3, jstack

Printf ‘%x\n’ tid — > Hexadecimal thread ID(navtive thread)

Decimal: jstack pid | grep tid – C 30 – color

Ps method to query the current THREAD dar, as well as the current THREAD holds the length of time: ps – mp 8278 – o THREAD, dar, time | head – n 40

A Java process has a high CPU usage, and we want to locate the thread with the highest CPU usage.

(1) Use the top command to find the thread with the highest CPU usage pid top-hp {pid} (2) Use the thread with the highest CPU usage ID 6900, convert it to hexadecimal form (because Java Native thread output in hexadecimal form)printf'x \ % n' 6900 (3) the use of Java thread jstack printed out the call stack information jstack 6418 | grep '0 x1af4' - A 50 - color note: The JStack PID can also directly query the stack status of all threads in this progression based on the process ID, but the problem thread is already known and should be directly located.# grep Parameter description
grep [-acinv] [--color=auto] [-A n] [-B n] 'Search string'The file name parameters are described as follows: -a: the binary document is processed as text. -C: the number of matches is displayed. -I: case difference is ignored. Show no matching lines --color: Highlight matching keywords in a specific colorCopy the code

4.4, jinfo

Jinfo pid
Description: Outputs all the parameters and system attributes of the current JVM process

Jinfo-flag name pid
# description: Prints parameters with their names. Using this command, you can view the values of the specified JVM parameters. For example, check whether GC log printing is enabled for the current JVM process.
View a JVM parameter value
jinfo -flag ReservedCodeCacheSize 28461 
jinfo -flag MaxPermSize 28461

# command: jinfo - flag [+ | -] name pid
Jinfo allows you to dynamically modify JVM parameters without rebooting the VIRTUAL machine. An online environment is especially useful.

Jinfo-flag name=value pid
Description: Modifies the value of the specified parameter.

Copy the code

4.5, jstat

# garbage collector statisticsJstat -gc pid S0C: size of the first survivable zone S1C: size of the second survivable zone S0U: size of the first survivable zone S1U: size of the second survivable zone EC: size of the Eden zone EU: size of the Eden zone OC: old age size OU: Used size of the old age MC: method area size MU: method area size CCSC: space size of the compressed class CCSU: space used size of the compressed class YGC: garbage collection times of the young generation YGCT: garbage collection time of the young generation FGC: garbage collection times of the old age FGCT: GCT: total garbage collection time jstat -gcutil 'pgrep -u admin Java'The #jstat command looks at how much of the heap memory is being used, as well as how many classes are being loaded.
The format of the command is as follows:Jstat [- command options] [vmid] [interval/ms] [number of queries]Copy the code

4.6. VisualVM Tools

VisualVM, which can monitor threads, memory, view methods’ CPU time and memory objects, objects that have been GC, and see the allocated stack in reverse (e.g., by which 100 Strings are allocated). VisualVM is easy to use, almost zero configuration, or relatively rich in features, almost all of the functions of other JDK built-in commands.

Memory information? Thread information? Dump heap (local process)? Dump thread (local process)? Open heap Dump. Heap dumps can be generated using Jmap. ? Open thread Dump? Create a snapshot of your application (memory information, thread information, and so on)? Performance analysis. CPU analysis (call time of each method, check which method takes more time), memory analysis (memory occupied by various objects, check which classes occupy more memory)Copy the code

The VisualJVM can not only monitor the local JVM process, but also monitor the remote JVM process, with the help of JMX technology.

2) What is JMX?

JMX (Java Management Extensions) is a framework for implementing Management capabilities for applications, devices, systems, and so on. JMX allows flexible development of seamlessly integrated system, network and service management applications across a range of heterogeneous operating system platforms, system architectures and network transport protocols.

To monitor remote Tomcat, you need to configure JMX in remote Tomcat as follows:

Save the Settings and exit, and restart Tomcat.

4) add a remote host using VisualJVM to connect to remote tomcat: there may be many JVMS that need to be monitored on one host, so next add the JVMS that need to be monitored on that host:

# In the bin directory of tomcat, modify catalina.sh and add the following parameters
JAVA_OPTS="‐ Dcom. Sun. Management. Jmxremote ‐ Dcom. The sun. The management jmxremote. Port = 9999 ‐ Dcom. Sun. Management. Jmxremote. Authenticate = false ‐ Dcom. Sun. Management jmxremote. SSL = false"
# these parameters mean:
#-Djava.rmi.server.hostname
# ‐ Dcom. Sun. Management. Jmxremote: allows you to use the JMX remote management
# ‐ Dcom. Sun. Management. Jmxremote. Port = 9999: the JMX remote connection
# ‐ Dcom. Sun. Management. Jmxremote. Authenticate = false: no authentication, no useAll users can be connected# ‐ Dcom. Sun. Management. Jmxremote. SSL = false: do not use SSL
Copy the code

The connection succeeded. Using the same method as before, you can monitor remote Tomcat processes just as you can monitor local JVM processes.

Hundreds of millions of traffic site performance optimization methodology steps