sequence

This article focuses on a case where frequent object generation causes excessive GC time.

Symptoms and analysis

The average gc pause time is nearly 4 seconds, and 13% of gc pauses are over 10 seconds.

[PSYoungGen: 457878K->126656K(489472K)] 1746043K->1453131K(1887744K), 12.1965757 secs] [Times: User =5.59 sys=0.52, real=12.19 secs] 154415.774: [GC (Allocation Failure) Desired survivor size 212860928 bytes, new threshold 1 (max 15) [PSYoungGen: [Times: Times: 3285K -> 3285k] 3285k -> 3285k [Times: 3285k] 3285k -> 3285k [Times: 3285k] User =0.10 sys=0.00, real= 0.17secs] 154416.838: [GC (Allocation Failure) Desired survivor size 235929600 bytes, new threshold 1 (max 15) [PSYoungGen: 341424K-> 14182K (445952K)] 1667955K-> 152434k (1844224K), 1.7996294 secs] [Times: User =0.89 sys=0.03, real= 1.80secs] 154419.456: [GC (Allocation Failure) Desired survivor size 225968128 bytes, new threshold 1 (max 15) [PSYoungGen: 434262K-> 123476K (468480K)] 1761114K->1486938K(1866752K), 23.0844304secs] [Times: User = 10.75sys =0.81, real= 23.09secs] 154442.541: [Full GC (Ergonomics) [PSYoungGen: 121776K->0K(468480K)] [ParOldGen: 1365162K->208108K(1398272K)] 1486938K->208108K(1866752K), [Metaspace: Secs] [Times: user= syssys =5.30, real= 23.59secs] [GC (Allocation Failure) Desired survivor size 217579520 bytes, new threshold 1 (max 15) [PSYoungGen: 200553K->106368K(486400K)] 408662K->314476K(1884672K), 1.0664613 secs] [Times: User =0.39 sys=0.13, real= 1.06secs] [GC (Allocation Failure) Desired survivor size 218103808 bytes, new threshold 1 (max 15) [PSYoungGen: [Times: Times] [Times: Times] [Times: Times] [Times: Times] User =1.12 sys=1.55, real=5.80 secs] 154514.037: [GC (Allocation Failure) Desired survivor size 213909504 bytes, new threshold 1 (max 15) [PSYoungGen: 409910K-> 40920k (489984K)] 691655K->407999K(1888256K), 10.1020217 secs] [Times: User =4.46 sys=0.61, real= 10.11secs] 154563.240: [GC (Allocation Failure) Desired survivor size 213385216 bytes, new threshold 1 (max 15) [PSYoungGen: Secs] [Times: Times: 32860K -> 32860K] 32860K -> 32860k, secs] [Times: 32860K -> 32860k, secs] [Times: 32860K, secs] User =0.04 sys=0.01, real=0.09 secs] 154564.037: [GC (Allocation Failure) Desired survivor size 219676672 bytes, new threshold 1 (max 15) [PSYoungGen: 342944K->153558K(478208K)] 663079K->474022K(1876480K), 3.1948641 secs] [Times: User =0.72 sys=0.69, real=3.19 secs] [GC (Allocation Failure) Desired survivor size 212336640 bytes, new threshold 1 (max 15) [PSYoungGen: 423382K->98528K(484352K)] 743846K->457302K(1882624K), 13.4085860secs] [Times: user= syssys =0.69, real= 13.408secs]Copy the code

After you dump the memory using Jmap, use MAT to share and look at thread_overview

Can see ElasticsearchJestHealthIndicator. DoHealthCheck hold not release a lot of object

Debug and reproduce

Local recurrence

Spring – the context – 4.3.7. RELEASE – sources. The jar! /org/springframework/jmx/export/SpringModelMBean.java

	/**
	 * Switches the {@link Thread#getContextClassLoader() context ClassLoader} for the
	 * managed resources {@link ClassLoader} before allowing the invocation to occur.
	 * @see javax.management.modelmbean.ModelMBean#invoke
	 */
	@Override
	public Object invoke(String opName, Object[] opArgs, String[] sig)
			throws MBeanException, ReflectionException {

		ClassLoader currentClassLoader = Thread.currentThread().getContextClassLoader();
		try {
			Thread.currentThread().setContextClassLoader(this.managedResourceClassLoader);
			returnsuper.invoke(opName, opArgs, sig); } finally { Thread.currentThread().setContextClassLoader(currentClassLoader); }}Copy the code

Spring – the boot – physical – 1.4.5. RELEASE – sources. The jar! /org/springframework/boot/actuate/endpoint/jmx/DataEndpointMBean.java

@ManagedResource public class DataEndpointMBean extends EndpointMBean { /** * Create a new {@link DataEndpointMBean} instance. * @param beanName the bean name * @param endpoint the endpoint to wrap * @param objectMapper the {@link ObjectMapper} used to convert the payload */ public DataEndpointMBean(String beanName, Endpoint<? > endpoint, ObjectMapper objectMapper) { super(beanName, endpoint, objectMapper); } @ManagedAttribute(description ="Invoke the underlying endpoint")
	public Object getData() {
		returnconvert(getEndpoint().invoke()); }}Copy the code

If local IDEA is enabled, JMX Agent can be reproduced

healthEndpoint

Spring – the boot – admin – server – 1.4.6 – sources. The jar! /de/codecentric/boot/admin/registry/StatusUpdateApplicationListener.java

	private long updatePeriod = 10_000L;

	public void startStatusUpdate() {
		if(scheduledTask ! = null && ! scheduledTask.isDone()) {return;
		}

		scheduledTask = taskScheduler.scheduleAtFixedRate(new Runnable() {
			@Override
			public void run() {
				statusUpdater.updateStatusForAllApplications();
			}
		}, updatePeriod);
		LOGGER.debug("Scheduled status-updater task for every {}ms", updatePeriod);

	}
Copy the code

Spring – the boot – admin – server – 1.4.6 – sources. The jar! /de/codecentric/boot/admin/registry/StatusUpdater.java

	public void updateStatusForAllApplications() {
		long now = System.currentTimeMillis();
		for (Application application : store.findAll()) {
			if (now - statusLifetime > application.getStatusInfo().getTimestamp()) {
				updateStatus(application);
			}
		}
	}

	public void updateStatus(Application application) {
		StatusInfo oldStatus = application.getStatusInfo();
		StatusInfo newStatus = queryStatus(application);

		Application newState = Application.create(application).withStatusInfo(newStatus).build();
		store.save(newState);

		if(! newStatus.equals(oldStatus)) { publisher.publishEvent( new ClientApplicationStatusChangedEvent(newState, oldStatus, newStatus)); } } protected StatusInfo queryStatus(Application application) { LOGGER.trace("Updating status for {}", application);

		try {
			@SuppressWarnings("unchecked") ResponseEntity<Map<String, Object>> response = restTemplate.getForEntity( application.getHealthUrl(), (Class<Map<String, Object>>) (Class<? >) Map.class); LOGGER.debug("/health for {} responded with {}", application, response);

			if (response.hasBody() && response.getBody().get("status") instanceof String) {
				return StatusInfo.valueOf((String) response.getBody().get("status"));
			} else if (response.getStatusCode().is2xxSuccessful()) {
				return StatusInfo.ofUp();
			} else {
				return StatusInfo.ofDown();
			}

		} catch (Exception ex) {
			if ("OFFLINE".equals(application.getStatusInfo().getStatus())) {
				LOGGER.debug("Couldn't retrieve status for {}", application, ex);
			} else {
				LOGGER.warn("Couldn't retrieve status for {}", application, ex);
			}
			returnStatusInfo.ofOffline(); }}Copy the code

You can see that admin-server registers a scheduled task that periodically calls /health

Problem analysis

ElasticsearchHealthIndicatorConfiguration

Spring – the boot – physical – 1.4.5. RELEASE – sources. The jar! /org/springframework/boot/actuate/autoconfigure/ElasticsearchHealthIndicatorConfiguration.java

class ElasticsearchHealthIndicatorConfiguration {

	@Configuration
	@ConditionalOnBean(Client.class)
	@ConditionalOnEnabledHealthIndicator("elasticsearch") @EnableConfigurationProperties(ElasticsearchHealthIndicatorProperties.class) static class ElasticsearchClientHealthIndicatorConfiguration extends CompositeHealthIndicatorConfiguration<ElasticsearchHealthIndicator, Client> { private final Map<String, Client> clients;  private final ElasticsearchHealthIndicatorProperties properties; ElasticsearchClientHealthIndicatorConfiguration(Map<String, Client> clients, ElasticsearchHealthIndicatorProperties properties) { this.clients = clients; this.properties = properties; } @Bean @ConditionalOnMissingBean(name ="elasticsearchHealthIndicator")
		public HealthIndicator elasticsearchHealthIndicator() {
			return createHealthIndicator(this.clients);
		}

		@Override
		protected ElasticsearchHealthIndicator createHealthIndicator(Client client) {
			return new ElasticsearchHealthIndicator(client, this.properties);
		}

	}

	@Configuration
	@ConditionalOnBean(JestClient.class)
	@ConditionalOnEnabledHealthIndicator("elasticsearch")
	static class ElasticsearchJestHealthIndicatorConfiguration extends
			CompositeHealthIndicatorConfiguration<ElasticsearchJestHealthIndicator, JestClient> {

		private final Map<String, JestClient> clients;

		ElasticsearchJestHealthIndicatorConfiguration(Map<String, JestClient> clients) {
			this.clients = clients;
		}

		@Bean
		@ConditionalOnMissingBean(name = "elasticsearchHealthIndicator")
		public HealthIndicator elasticsearchHealthIndicator() {
			return createHealthIndicator(this.clients);
		}

		@Override
		protected ElasticsearchJestHealthIndicator createHealthIndicator(
				JestClient client) {
			returnnew ElasticsearchJestHealthIndicator(client); }}}Copy the code

ElasticsearchHealthIndicator

Spring – the boot – physical – 1.4.5. RELEASE – sources. The jar! /org/springframework/boot/actuate/health/ElasticsearchHealthIndicator.java

	@Override
	protected void doHealthCheck(Health.Builder builder) throws Exception { List<String> indices = this.properties.getIndices(); ClusterHealthResponse response = this.client.admin().cluster() .health(Requests.clusterHealthRequest(indices.isEmpty() ?  allIndices : indices.toArray(new String[indices.size()]))) .actionGet(this.properties.getResponseTimeout()); switch (response.getStatus()) {case GREEN:
		case YELLOW:
			builder.up();
			break;
		case RED:
		default:
			builder.down();
			break;
		}
		builder.withDetail("clusterName", response.getClusterName());
		builder.withDetail("numberOfNodes", response.getNumberOfNodes());
		builder.withDetail("numberOfDataNodes", response.getNumberOfDataNodes());
		builder.withDetail("activePrimaryShards", response.getActivePrimaryShards());
		builder.withDetail("activeShards", response.getActiveShards());
		builder.withDetail("relocatingShards", response.getRelocatingShards());
		builder.withDetail("initializingShards", response.getInitializingShards());
		builder.withDetail("unassignedShards", response.getUnassignedShards());
	}
Copy the code

Spring – the boot – physical – 1.4.5. RELEASE – sources. The jar! /org/springframework/boot/actuate/health/ElasticsearchHealthIndicatorProperties.java

@ConfigurationProperties(prefix = "management.health.elasticsearch", ignoreUnknownFields = false)
public class ElasticsearchHealthIndicatorProperties {

	/**
	 * Comma-separated index names.
	 */
	private List<String> indices = new ArrayList<String>();

	/**
	 * Time, in milliseconds, to wait fora response from the cluster. */ private long responseTimeout = 100L; / /... }Copy the code

Indices (/index1,index2/ _STATS); if not, all indices (a potential pit).

ElasticsearchJestHealthIndicator

Spring – the boot – physical – 1.4.5. RELEASE – sources. The jar! /org/springframework/boot/actuate/health/ElasticsearchJestHealthIndicator.java

	@Override
	protected void doHealthCheck(Health.Builder builder) throws Exception {
		JestResult aliases = this.jestClient.execute(new Stats.Builder().build());
		JsonElement root = this.jsonParser.parse(aliases.getJsonString());
		JsonObject shards = root.getAsJsonObject().get("_shards").getAsJsonObject();
		int failedShards = shards.get("failed").getAsInt();
		if(failedShards ! = 0) { builder.outOfService(); }else{ builder.up(); }}Copy the code

Jest sends a /_all/_stats request, and that’s where the problem is. For example, if you want to query all of your stats for elasticSearch, that’s a lot of data. If you want to query all of your stats for elasticSearch, that’s a lot of data. This, combined with the scheduled task /health query, results in a constant generation of new objects and a very frequent YGC, resulting in memory leaks.

summary

Health is a great app to monitor, but you need to pay attention to the frequency and implementation of the app. If you don’t pay attention to the frequency of the app, you need to pay attention to the implementation of the app. If you don’t pay attention to the frequency of the app, you need to pay attention to the implementation.

doc

  • Java Spring error summary