Author: Unreal good

Source: Hang Seng LIGHT Cloud Community

Artifact introduces

In our work, we sometimes need to get data from a website as a source, such as some hot rankings. But can’t write code or feel too cumbersome to write code to crawl, is there any efficient tool to help us solve this problem? You need to capture data from a site, such as some hot rankings. But can’t write code or feel too cumbersome to write code to crawl, is there any efficient tool to help us solve this problem?

Artifact introduces

The web scraper is a Chrome plug-in that allows you to grab data from a web page using a graphical interface.

Sitemap is a free extension for Chrome. You can create a Sitemap to retrieve data from over 95% of websites (such as blog lists, Zhihu answers, weibo comments, etc.).

(Download link: Domestic, overseas)

Using the tutorial

Take the ranking of station B as an example, we need to capture the data of the ranking. (address: www.bilibili.com/v/popular/r…

1. After installing the Web scraper plug-in, open the development tool on F12 and go to the plug-in page.

2. I have a Web scraper tag, sitemaps, sitemap, create new Sitemap. Click Create to create a crawler task. Please fill in the name and the URL in the address of the webpage.

3. Then click Add New Selector to add a new selector, give the ID a name, type element, click Select to select the page element you want to climb, you can see that the page is marked red.

Then select the second one, and you can see that the following contents are selected. Click Done Selecting.

Then click On Element Preview to see that all page elements are captured, and select Multiple. Then save the selector.

4. Then create a selector through the above method, and grab data such as ranking, title, playback amount, comment amount, profile graph, link and so on. Click on the Selector Graph to see the selector graph for fetching.

5. After the selectors are all built, click what he says to start scraping data, wait until the scraping is complete, and refresh the data preview.

6. Finally, CSV files can be exported to view and capture data with Excel.

conclusion

Finally, attach my crawl script, import can crawl data.

{"_id":"bilibili"."startUrl": ["https://www.bilibili.com/v/popular/rank/all"]."selectors": [{"id":"bilibili_rank"."parentSelectors": ["_root"]."type":"SelectorElement"."selector":"li.rank-item"."multiple":true."delay":0}, {"id":"-"."parentSelectors": ["bilibili_rank"]."type":"SelectorText"."selector":" div.num"."multiple":true."delay":0."regex":""}, {"id":"- the title"."parentSelectors": ["bilibili_rank"]."type":"SelectorText"."selector":"a.title"."multiple":false."delay":0."regex":""}, {"id":"Play quantity"."parentSelectors": ["bilibili_rank"]."type":"SelectorText"."selector":".detail > span:nth-of-type(1)"."multiple":false."delay":0."regex":""}, {"id":"Comment volume"."parentSelectors": ["bilibili_rank"]."type":"SelectorText"."selector":"span:nth-of-type(2)"."multiple":false."delay":0."regex":""}, {"id":"Introduction Map"."parentSelectors": ["bilibili_rank"]."type":"SelectorImage"."selector":"img"."multiple":false."delay":0}, {"id":Links to "-"."parentSelectors": ["bilibili_rank"]."type":"SelectorText"."selector":"a.title"."multiple":false."delay":0."regex":""}}]Copy the code