Writing in the front

Last updated 2020-11-16 code address github.com/yayxs/node-…

Summary of demand

As a full stack personal website will be built in the future, we tentatively plan to put some technical documents and videos produced by the national KARAOKE resource B station, and also plan to integrate the blog into the website. So this article will share the Node crawler. The whole site is divided into three parts

  • Music: Songs recorded by third party platforms
  • Video: video clips from third-party platforms
  • Article: Mundane technical documentation

This article is going to crawl some data, temporarily put in the local database

preparation

As the saying goes, to do good work, must first sharpen its tools, so early preparation is nothing more than

  • Refer to the document
  • Environmental tools

Will read the document

  • Node. Js v12.16.0 document

    For node, we can read the Chinese document directly, I think

  • JQuery a Chinese document, you can view some jQuery API online

Why do crawlers need jQuery? Read on

Environment to prepare

  • node && nodemon
npm install -g nodemon
Copy the code

  • axios
yarn add  axios
Copy the code
  • mysql
yarn add mysql
Copy the code
  • cheerio
yarn add cheerio
Copy the code

Demand analysis

Gets a list of columns, refreshes the list page to analyze interface requestsCrawl the relevant data and store itDatabase table

After our analysis of the interface, it is concluded that the API to get the data column is this

https://timeline-merger-ms.juejin.im/v1/get_entry_by_self? src=web&uid=5cf00b7c6fb9a07eba2c226f&device_id=1580692913721&token=eyJhY2Nlc3NfdG9rZW4iOiJqa3FzYTJaUzB3cTY3VVBoIiwicmVmc mVzaF90b2tlbiI6ImJrcG9LMnAyaUlSUFRvSFUiLCJ0b2tlbl90eXBlIjoibWFjIiwiZXhwaXJlX2luIjoyNTkyMDAwfQ%3D%3D&targetUid=5cf00b7c6f b9a07eba2c226f&type=post&limit=20&order=createdAt
Copy the code

The discovery can be made directlyThe browserAddress bar search, so far only14Because I just wrote it14article

So let’s run this interface in node

async function getEntryBySelf() {
  let reqUrl = `https://timeline-merger-ms.juejin.im/v1/get_entry_by_self? src=web&uid=5cf00b7c6fb9a07eba2c226f&device_id=1580692913721&token=eyJhY2Nlc3NfdG9rZW4iOiJqa3FzYTJaUzB3cTY3VVBoIiwicmVmc mVzaF90b2tlbiI6ImJrcG9LMnAyaUlSUFRvSFUiLCJ0b2tlbl90eXBlIjoibWFjIiwiZXhwaXJlX2luIjoyNTkyMDAwfQ%3D%3D&targetUid=5cf00b7c6f b9a07eba2c226f&type=post&limit=20&order=createdAt`;
  const res = await axios.get(reqUrl);
  const {
    s,
    m,
    d: { total, entrylist }
  } = res.data;
  if (s === 1 && m === "ok") {
    // The request succeeded
    return entrylist;
  } else {
    return `reqErr`; }}Copy the code

MySql in combination with the Node

We use the Node environment and then operate the database. In this article, we will briefly talk about the operation related to the data. We will continue to update later, hoping to pay attention to the author github

Database connection

// Configure objects
const config = {
  host: "localhost".// Host address
  user: "root".// Database user
  password: "123456"./ / password
  database: "blog" / / database
};
// Establish a connection
let con = mysql.createConnection(config);
con.connect(err= > {
  if (err) {
    console.log(Database establishment failed); }});Copy the code

Build a table – column

Create a zhuan_LAN table in the blog database to hold the nuggets column articles and initialize some fields with roughly the same names as the crawled data fields

Inserting a database

// Data is inserted into the database
// let testSql = "INSERT INTO zhuan_lan (collectionCount,commentsCount,originalUrl,createdAt,screenshot,content,titlte,category,viewsCount,summaryInfo) VALUES (1, 21212, '212', '212', '212', '212', '212', '221', '2121', '212') ";
let iblogSql =
  "INSERT INTO zhuan_lan (collectionCount,commentsCount,originalUrl,createdAt,screenshot,content,titlte,category,viewsCount,summaryInfo) VALUES (? ,? ,? ,? ,? ,? ,? ,? ,? ,?) ";
// Insert data
con.query(iblogSql, arrData, (err, res) = > {
  if (err) {
    console.log(err);
  } else {
    console.log('Inserted successfully'); }});Copy the code

Inserting a database