preface

Recently I was watching the new TV series “Fight the Sky”, but the advertisement of an episode of THE TV series began 45s in the middle of 90s and ended 15s in the end. It was really outrageous. I have learned before that there are many free VIP parsing interfaces on the Internet, so I decided to start a small website that can remove advertising, so there is a pit pit process.

The road of mining pit

First, this is the final online preview. The core of the article is to crawl the movie link, and then the existing member resolution interface splicing display.

1.).

Video parsing interface + Movie link = ad-free movie.

2. To find the interface

After searching for half an hour, I finally found a qualified interface on the aggregated data. I could get the movie link and release information, which was not too comfortable. The procedure found that the aggregated data interface data was incomplete, many new movies did not return data, and there was a limit on the number of calls, the scheme ended.

3. Nodejs crawler

There are two ideas: 1. Crawl all video links of major video websites and store them in the database, so that users can search for movies in the database when searching. 2. After obtaining the front end search keyword, go to the video website to retrieve it and return it to the front end. The first way of data volume is relatively large, to take into account the site’s anti-crawling measures, it is best to write a script timed crawling to maintain access to the latest data of the video site. The second method of online crawling will have a certain delay. After testing, the delay time is about 1s, so we decided to adopt the second method. Through the analysis of video websites, it is found that each video website has a whole network search interface. This time, the penguin video is climbed and the code is on.

4. Front-end display

There isn’t much of a technology stack, but the requirements change all the time, from native JS to jquery, and eventually even frameworks. . Not much. Here’s the Github address

5. To summarize

Nodejs does a lot of crawler pit, but it’s not a good idea to use tags to crawl data because movies, TV shows, variety shows, sequels, etc have different HTML structures. And in getting the TV series’… ‘Link is also a bit of a pit, because nodeJS breakpoint debugging is not possible, very simple problems have been unable to locate. The first time to write an article, ideas, format are not very good, forgive me. Off work, heh heh, slip away.