I am participating in the Mid-Autumn Festival Creative Submission contest, please see: Mid-Autumn Festival Creative Submission Contest for details

Next week is the Mid-Autumn Festival, I wish you a happy Mid-Autumn Festival in advance.

Today we are going to use JS to write a program to crawl the first 100 pages of mooncake sales on Jingdong and see how much mooncakes can be sold every day after the Mid-Autumn Festival.

The data is for reference only and accuracy is not guaranteed.

Thank you for your help. It’s not easy to stay up late to write

The technology that’s going to be used

  1. Tampermonkey – Google Chrome plugin

  2. JavaScript native DOM manipulation

  3. Fetch request

  4. Async await delay

  5. Express creates data storage API, statistics API

  6. Node.js reads the JSON file

  7. Deploy to Tencent Cloud Serverless service

Statistical data presentation

Note that the data of 2021-9-7 is mock data, so that the thanLastDay field of 2021-9-8 can calculate the data

Field Description:

{
    "date": "2021-9-8"./ / date
    "total": "8.9026 trillion".// Total sales as of date
    "thanLastDay": "76.87 million" // How much did the total sales increase compared with the previous day
}
Copy the code

Now let’s get started

1. Install the Tampermonkey plug-in

If you can go online directly to science, visit the official link below to install

Chrome.google.com/webstore/de…

If you can’t surf the Internet scientifically, go to Baidu and search for Tampermonkey. There are many websites that provide local installation methods, but I won’t provide them here to avoid infringement.

2. Write scripts to crawl jingdong moon cake data

After the installation is successful, it is displayed in the upper right corner of the browser, as shown in the figure below

Enter the home page of JINGdong first, search for moon cakes and enter the list of products

Then click the Admin panel to enter the script list page, where you can turn a script on or off

Then, click the + sign to create a new script

I’ve got a simple script here that I can paste in

// ==UserScript==
// @name JD mooncake
// @namespace http://tampermonkey.net/
/ / @ version 0.1
// @description is used to crawl 100 pages of commodity data
// @author
// @match https://search.jd.com/**
// @icon https://www.google.com/s2/favicons?domain=jd.com
// @grant none
// ==/UserScript==

(function() {
    'use strict';

    // Get the number of sales
    function getNumber(str) {
        if (str.includes('m +')) {
            return parseInt(str) * 10000
        }
        return parseInt(str)
    }

    // Wait function
    function sleep(time) {
        return new Promise((resolve, reject) = > {
            setTimeout(resolve, time * 1000)})}async function main() {
        // Wait for the first page data to load
        await sleep(3)
        for (let i = 0; i < 100; i++ ){
            // Scroll to the bottom
            window.scrollTo(0.18000)
            // Wait for the bottom data to load
            await sleep(3)
            // Scroll the bottom again in case any data is not loaded
            window.scrollTo(0.18000)
            // Wait for the bottom data to load
            await sleep(2)
            // Calculate the total price of all goods sold
            await getTotal()
            // Jump to the next page
            document.querySelector('#J_bottomPage > span.p-num > a.pn-next').click()
            // Wait for the next page of data
            await sleep(3)}}async function getTotal() {
        let pageTotal = 0
        document.querySelectorAll('#J_goodsList > ul > li').forEach(el= > {
            // Commodity prices
            const price = parseFloat(el.querySelector('.p-price i').innerText)
            // Product evaluation quantity
            const saleNum = getNumber(el.querySelector('.p-commit a').innerText)

            console.log(price, saleNum)
            // 

            pageTotal += price * saleNum
        })

        // Will this page sales general
        const res = await fetch('http://localhost:9000/save', {
            method: 'POST'.headers: {
                'Content-Type': 'application/json',},body: JSON.stringify({pageTotal})
        })
        const json = await res.json()
        console.log('Success:', json);
    }

    // Run the program
    main()
})();
Copy the code
  • First, a for loop, fixed 100, because jingdong’s product list page is 100 pages in total
  • Scroll to the bottom of the page, because some of the list data is asynchronously loaded by Ajax
  • sleepFunction to wait a fixed time, using async await syntax
  • Then wait 3 seconds before scrolling to the bottom in case the data is not loaded
  • Then use thedocument.querySelectorAllGet all the items on the page
  • Then use thedocument.querySelectorGet the price and rated quantity for each item
  • Calculate total page salespageTotal
  • And then usefetchrequestNode.jsStorage API that stores the sales calculated on the current page for subsequent analysis
  • Finally, I went to the homepage of JINGdong and searched for moon cakes. Then I entered the search page and waited for the page to turn to the last page 100. When the data collection was completed, I could do something else, which would take a long time.

Now, let’s take a look at the demo

[Nuggets can’t upload video, whoo-hoo…]

3. Build storage and analysis apis with Express

The following code

const express = require('express')
const cors = require('cors');
const path = require('path')
const fs = require('fs')

var app = express();

app.use(express.json())
app.use(express.urlencoded({extended: true}))

app.use(cors())

// Get the statistical data
app.get('/get'.(req, res) = > {
  const data = []
  // Get the total sales for the specified date
  const getTotal = (date) = > {
    const filePath = path.join(__dirname, 'data'.`${date}.json`)
  
    if(! fs.existsSync(filePath)) {return 0
    }
    
    const data = JSON.parse(fs.readFileSync(filePath))

    if (data.today) {
      return data.total;
    }
  
    const total = data.data.reduce((total, currentValue) = > {
      return total + Math.floor(currentValue) / 10000;
    })
    // The total number of caches, not counted next time
    data.total = total; / / unit
    fs.writeFileSync(filePath, JSON.stringify(data))

    return total;
  }

  // Gets the last day of the specified date
  const getLastDay = (dateTime) = > {
    let date_ob = new Date(dateTime);
    date_ob.setDate(date_ob.getDate() - 1)
    let date = date_ob.getDate();
    let month = date_ob.getMonth() + 1;
    let year = date_ob.getFullYear();
    let today = year + "-" + month + "-" + date;
    return today
  }

  // All statistics date data
  const dateList = fs.readdirSync(path.join(__dirname, 'data'))

  // Return the data to calculate the increase from the previous day
  dateList.forEach(fileName= > {
    const date = fileName.replace('.json'.' ')
    data.push({
      date,
      total: Math.floor(getTotal(date) / 10000) + '亿'.thanLastDay: getTotal(getLastDay(date)) ! = =0 ? Math.floor(getTotal(date) - getTotal(getLastDay(date))) + '万' : 'No data at present'})})// In descending order by date
  res.send(data.sort((a,b) = > new Date(b.date) - new Date(a.date)))
});

// Store 100 pages of merchandise sales for the day
app.post('/save'.(req, res) = > {

  // Get the current date
  let date_ob = new Date(a);let date = date_ob.getDate();
  let month = date_ob.getMonth() + 1;
  let year = date_ob.getFullYear();
  let today = year + "-" + month + "-" + date;

  // File path
  const filePath = path.join(__dirname, 'data'.`${today}.json`)
  
  // If there is no storage file
  if(! fs.existsSync(filePath)) { fs.writeFileSync(filePath,JSON.stringify({data: []}}))// Read the file
  const data = JSON.parse(fs.readFileSync(filePath))
  // Store sales under all items in the current page
  data.data.push(req.body.pageTotal)
  // Write to json file
  fs.writeFileSync(filePath, JSON.stringify(data))
  // Return data
  res.send(data);
});


app.listen(3000.function () {
  console.log('Service started successfully: http://localhost:3000');
});
Copy the code

There are two main APIS

GET - http://localhost:9000/get
Copy the code

The data used to obtain statistics is structured as follows

[{"date": "2021-9-8"./ / date
        "total": "8.8615 trillion".// Total sales
        "thanLastDay": "43.38 million" // An increase in sales over yesterday
    },
    {
        "date": "2021-9-7"."total": "8.8615 trillion"."thanLastDay": "No data at present"}]Copy the code
POST - http://localhost:9000/save
Copy the code

It is used to store sales per page for the day, and the data will be stored in the data/ current date. json file

{"data": [885434000.692030500.234544840.601344769.5.172129350.182674704.6.133972752.6.205753590.80450922.77355786.19999999.151456533.110421752.92058113.7.303276508.174283087.7.271311291.3.63696476.8.141753035.7.338476616.4.270641094.86462147.27128625.36139929.45965566.900000006.72166439.10000001.192549501.10540359.4.69775609.4.22760644.18128574.6.4775594.2.11293833.100000001.69100044.5.18697712.7.5837212.3.10642395.6.12401900.700000003.7687292.750000001.5542854.199999999.6173778.3.15844723.86.312611521.7.322072634.2.57924578.365159510.31830203.6.37628351.7.11473636.700000001.25383806.799999997.30270479.9.82777935.4.71801949.17886438.4.76748973.5.29326328.4.11953917.4.5390966.8.25723722.5.9660846.33003014.7.35118788.5.11297238.8.7611442.84.19172848.34.6824560.18840682.700000003.13633325.1.61348156.3.32949962.4.28584186.1.25574649.3.40607000.4.27084038.700000003.34280644.35.13503164.6.7837763.899999999.27559845.42.12587807.8.11210537.2.10225227.48.14791757.24.14573441.399999999.5919098.6.7467049.7.26552201.6.6259477.100000001.7240613.68.5715078.5421074.500000001.6174596.500000001.12098670.3628428.2.5442460.100000001.6925294.8.16266156.259999998.7562844.060000001.16977870.1.6701592.3999999985.6060801.6081381.699999999]}
Copy the code
  • Mainly used in projectsfs.writeFileSyncandfs.readFileSyncTo read and write JSON files
  • cors()Middleware to open up cross-domains

4. Deploy to the Tencent Cloud Serverless service

Finally, I deployed the Express service to the cloud for everyone to see

  1. Change the listening port of express project to 9000 (Tencent cloud must be 9000)

  2. Create the scf_bootstrap startup file

#! /bin/sh
npm run start
Copy the code
  1. Log in Tencent Cloud Serverless console, click on the left function service

  2. Click on the New button

  3. Select “Custom Create”

  • Function type: Select Web Function.
  • Function name: Fill in your own function name.
  • Region: Enter your function deployment region. The default is Guangzhou.
  • Operating environment: Select Nodejs 12.16.
  • Deployment mode: Select Code Deployment and upload your local project.
  • Submit method: Select Upload Folder locally.
  • Function code: Select a specific local folder for the function code.
  1. Choose to complete

See the video below for details

[Nuggets can’t upload video, whoo-hoo…]

After successful deployment, we will provide Tencent cloud address, which can be used to test the service

service-ehtglv6w-1258235229.gz.apigw.tencentcs.com/release/get

Note:

  1. Tencent Cloud Serverless has a certain amount of free use, details see
  2. Serverless allows file modification, so/saveThe service will report an error and the solution can be mountedCFS file systemI won’t bother. I have to pay.

Github source code:

Github.com/cmdfas/expr…

5. To summarize

At the end of the day, we’re done with everything from climbing 100 pages of data per day with oilmonkey, to storing it in JSON files with Express, to calculating the daily margin. Fulfill the requirement of calculating mooncake sales on a daily basis.