Kk-anti-reptile is an anti-crawler component suitable for distributed systems developed based on Spring-boot.

System requirements

  • Development based on spring-boot (spring-boot1.x, spring-boot2.x)
  • Redis is required

The working process

Kk-anti-reptile uses the servlet-based Filter specification to Filter requests, instantiating a Filter internally via the spring-boot extension point mechanism. It is injected into the Spring container FilterRegistrationBean, which is injected into the Servlet container via Spring to filter requests.

In the Filter Filter of KK-anti-reptile, a variety of different filtering rules are woven into the responsibility chain mode, and abstract interfaces are provided, which can be extended by the caller.

If the Filter fails, the request is intercepted, the status code 509 is returned, and the verification code input page is output. If the output verification code is correct, the Filter rule chain is invoked to reset the rule.

There are currently two rules in the rule chain

ip-rule

Ip-rule Collects statistics on the number of requests in the current time window. If the number of requests is smaller than the specified maximum, the number of requests will pass. Otherwise, the number will not pass. You can configure the time window, maximum number of requests, and IP address whitelist.

ua-rule

Ua-rule Determines the user-agent carried in a request to obtain information about the operating system, device, and browser. You can configure various dimensions to filter requests.

After the hit rule

After the crawler and anti-theft brush rules are matched, the request will be blocked and the verification code will be generated. The verification code can be combined in various ways. If the client can input the verification code correctly, the access can be continued

The verification code has three forms of Chinese, English letters + numbers and simple arithmetic, and each form has two image formats of static picture and GIF GIF. At present, there are six types as follows. All types of verification code will appear randomly, and the current technical means are extremely difficult to identify, which can effectively prevent crawlers from crawling large scale data

Access to the use of

Back-end access is as simple as referring to the Maven dependency of KK-anti-reptile and configuring to enable kK-anti-reptile to join the Maven dependency

< the dependency > < groupId > cn. Keking. ProjectgroupId > < artifactId > kk - anti - reptileartifactId > < version > 1.0.0 - SNAPSHOTversion > dependency>Copy the code

Configure to enable kK-anti-reptile

anti.reptile.manager.enabled=true

The front-end needs to add interception in ajax where the request is uniformly sent. After intercepting the request return status code 509, a new page pops up and the response content is forwarded to the page. Then, the back-end interface baseUrl parameter is passed to the page.

import axios from 'axios';
import {baseUrl} from './config';

axios.interceptors.response.use(
  data => {
    return data;
  },
  error => {
    if (error.response.status === 509) {
      let html = error.response.data;
      let verifyWindow = window.open(""."_blank"."height=400,width=560");
      verifyWindow.document.write(html);
      verifyWindow.document.getElementById("baseUrl").value = baseUrl; }});export default axios;Copy the code

Pay attention to

  • Apollo-client requires bootstrap

Users of Apollo allocation center, @ ConditionalOnProperty components due to the internal use, in the application. The properties/bootstrap properties to add the following sample configuration, (Apollo-Client requires version 0.10.0 and above) see Apollo Bootstrap instructions

apollo.bootstrap.enabled = true

  • Need to have Redisson

Kk-anti-reptile automatically retrieves the RedissonClient instance object if the project is useful for Redisson; If not, add the following Redisson connection configuration to the configuration file:

Spring. Redisson. Address = redis: / / 192.168.1.204:6379 spring. Redisson. Password = XXX

Configuration Overview

In Spring-Boot, all configurations are automatically prompted and explained in the configuration file, as shown below:

All configurations are prefixed with anti-reptile. manager. The following are all configuration items and descriptions:

Recently interviewed BAT, organized an interview materials “Java Interview BATJ Customs Manual”, covering Java core technology, JVM, Java concurrency, SSM, microservices, database, data structure and so on.

Obtaining method:

Pay attention to forward + forward + forward private message reply keywords 【 learn 】 can obtain ~