Personally, I have a service called AnyPaper that relies on Baidu academic. If you are interested in it, you can find out more about it.

Recently baidu Academic changed a new face, this change does not matter, my whole website often appears no response, login server found the system CPU stable at about 100%, what is going on?

First list the site environment:

  • Host: Ali Cloud Server (2H4G)
  • Operating system: Windows Server 2016
  • Website: SpringBoot development, single Jar run

Looking for problems

Since the website has been running stably, it can be basically ruled out as a problem in the code and a high CPU usage. My reaction may be caused by the following reasons:

  1. The system was poisoned, and was implanted with a mining program (after the website program stopped, the CPU immediately returned to normal, which could rule out the possibility of system poisoning);
  2. Website bugs, embedded in mining scripts (ZipUtil and FileUtil slip vulnerabilities in the lower version of Hutool, it seems that the lower version of SpringBoot also has some security risks);
  3. The number of user requests is very high (actually the number of visitors is pitiful).

To sum up, the version of SpringBoot and Hutool was upgraded first, and then the local debugging operation was started. It was found that the system was stuck when AnyPaper was used for literature search. Step by step debugging, it was found that the program was stuck in the regular matching (the matching content is the source code of Baidu academic search results) :

[\s\S]+? target=”_blank”>([\s\S]+?) \s*

([\s\S]+?) (? : (? :publish’}”\s*title=”([^”]+)[\s\S]+?) | (? :([^<]+)[^<]+]+>\s*(\d+)))[\s\S]+? c_abstract”>\s+([\s\S]+?) (? :\s*