✨War Of Resistance Live: Record the days and nights Of the 14-year War Of Resistance

✨ Open Source: github.com/kokohuang/W…

Address: ✨ preview kokohuang. Making. IO/WarOfResist…

preface

In the current impetuous Internet environment, it is not difficult to do a good deed, but difficult to do a meaningful thing for eight consecutive years.

There is such a blogger on Weibo. From July 7, 2012 to September 2, 2020, the @anti-Japanese War live broadcast recorded the history of the Chinese nation’s all-out war of Resistance from July 7, 1937 to August 15, 1945 in the form of pictures and texts. 2980 days, uninterrupted, an average of 12 a day, a total of 35,214.

At 7:07 am on September 18, 2020, the @anti-Japanese War live broadcast resumed its update after being silent for half a month. They will continue to record the history of the Anti-Japanese War from September 18, 1931 to July 7, 1937 in the form of pictures and texts.

For the next six years, they are already on their way.

History cannot be forgotten.

What can I do as a programmer in front of history?

In addition to admiration for the persistence of @anti-Japanese War live broadcast for so many years, I want to do something within my power and meaningful.

The project was born with the permission and support of the blogger @anti-Japanese War.

War Of Resistance Live

├ ─ ─ .github/workflows Workflow profile
├ ─ ─ resources # Microblog data
├ ─ ─ site # blog source
└ ─ ─ spider # twitter crawler
Copy the code

Blog’s front page: kokohuang. Making. IO/WarOfResist…

WarOfResistanceLive is an open source project consisting of a Python crawler + Hexo blog + Github Actions continuous integration service, open source on Github and deployed on Github Pages. Currently includes the following functions:

  • Data is automatically updated on a daily basis
  • View all the current microblog data of the blogger
  • supportRSSSubscribe to the function
  • Based on theGithub ActionsContinuous integration services
  • .

Next, I will briefly introduce some of the core logic and implementation of the project.

Python crawler

The crawler used in this project is a simplified and modified implementation based on Weibo – Crawler project (only for research use), thanks to the author dataABC.

Realize the principle of

  • Bypassing login verification by accessing the mobile version of weibo, you can view most of a blogger’s microblog data, such as: m.weibo.cn/u/289639010…

  • Through the developer tools to see that through json interface https://m.weibo.cn/api/container/getIndex can obtain weibo data list:

    def get_json(self, params) :
        """ Get JSON data from web page """
        url = 'https://m.weibo.cn/api/container/getIndex?'
        r = requests.get(url,
                         params=params,
                         headers=self.headers,
                         verify=False)
        return r.json()
    Copy the code

use

Install dependencies:

pip3 install -r requirements.txt
Copy the code

Use:

python weibo.py
Copy the code

Matters needing attention

  • Too fast speed is easy to be restricted by the system: the risk of being restricted by the system can be reduced by adding the random waiting logic;

  • Unable to obtain all microblog data: all data can be obtained by adding cookie logic;

See Weibo – Crawler for more information.

Hexo

After some choice, Hexo + Next was chosen as the blog framework of this project.

Hexo is a Node.js-based static blogging framework that is easy to install and use, easy to generate static web Pages hosted on GitHub Pages, and a rich selection of topics. For details on how to install and use Hexo, see the official documentation: hexo. IO /zh-cn/docs/.

So how do you implement RSS subscriptions?

Thanks to Hexo’s rich plug-in capabilities, Hexo-Generator-Feed can be easily implemented.

First, install the plugin in the blog root directory:

$ npm install hexo-generator-feed --save
Copy the code

Next, add the configuration to the _config.yml file in the root directory of the blog:

feed:
  enable: true # Whether to enable plug-ins
  type: atom # Feed type, support atom and RSS2, default atom
  path: atom.xml The path to the generated file
  limit: 30 # generate the maximum number of articles, or all articles if 0 or false
  content: true # if true, display the entire content of the article
  content_limit: # Length of content displayed in the article, only valid if content is false
  order_by: -date Sort by date
  template: # Custom template path
Copy the code

Finally, add the RSS feed entry to the _config.yml file in the topic root directory:

menu:
  RSS: /atom.xml || fa fa-rss # atom.xml file path address and icon Settings
Copy the code

In this way, we can add RSS feeds to our blogs. WarOfResistanceLive’s subscription address is:

https://kokohuang.github.io/WarOfResistanceLive/atom.xml
Copy the code

Github Actions continuous integration

Github Actions is a continuous integration service launched by Github in October 2018. Until then, we will probably use Travis CI more to implement continuous integration services. In my opinion, Github Actions are very powerful and much more playable than Travis CI. Github Actions has a rich market of Actions that together make it easy to accomplish a lot of interesting things.

Let’s take a look at some basic concepts of Github Actions:

  • Workflow: Workflow. Continuous integration of a running process. This file is stored in the.github/workflows directory of the repository and can contain multiple files.

  • Job: a task. A Workflow can contain one or more jobs, representing an integrated run that completes one or more tasks;

  • Step: Indicates the steps. A job consists of multiple steps that are required to complete a task.

  • Action: indicates the action. Each step can contain one or more actions, that is, multiple actions can be performed within a step.

With these basic concepts in mind for Github Actions, let’s take a look at how WarOfResistanceLive’s continuous integration service is implemented. Here’s the complete Workflow implementation for this project:

# Workflow's name
name: Spider Bot

# Set time zone
env:
  TZ: Asia/Shanghai

Set the workflow trigger mode.
on:
  # Timer trigger, update every 2 hours between 8:00 and 24:00 (https://crontab.guru)
  Cron is set to UTC time, so +8 is Beijing time
  schedule:
    - cron: "0 0-16/2 * * *"

  # Allow manually triggering Actions
  workflow_dispatch:

jobs:
  build:
    # Use Ubuntu -latest as the runtime environment
    runs-on: ubuntu-latest

    # Sequence of tasks to be performed
    steps:
      # Check out the warehouse
      - name: Checkout Repository
        uses: actions/checkout@v2

      Set up the Python environment
      - name: Setup Python
        uses: actions/setup-python@v2
        with:
          python-version: "3.x"

      # Cache PIP dependencies
      - name: Cache Pip Dependencies
        id: pip-cache
        uses: actions/cache@v2
        with:
          path: ~/.cache/pip
          key: The ${{ runner.os }}-pip-${{ hashFiles('./spider/requirements.txt') }}
          restore-keys: | ${{ runner.os }}-pip-      
      Install PIP dependencies
      - name: Install Pip Dependencies
        working-directory: ./spider
        run: | python -m pip install --upgrade pip pip install flake8 pytest if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
      Run the crawler script
      - name: Run Spider Bot
        working-directory: ./spider  # Specify the working directory, only for the run command
        run: python weibo.py

      Get the current system time
      - name: Get Current Date
        id: date
        run: echo "::set-output name=date::$(date +'%Y-%m-%d %H:%M')"

      # commit change
      - name: Commit Changes
        uses: EndBug/add-and-commit@v5
        with:
          author_name: Koko Huang
          author_email: [email protected]
          message: "Latest data has been synchronized (${{steps.date.outputs.date}})"
          add: ". /"
        env:
          GITHUB_TOKEN: The ${{ secrets.GITHUB_TOKEN }}

      # push remote
      - name: Push Changes
        uses: ad-m/github-push-action@master
        with:
          branch: main
          github_token: The ${{ secrets.GITHUB_TOKEN }}

      Set up the Node.js environment
      - name: Use Node.js 12.x
        uses: actions/setup-node@v1
        with:
          node-version: "12.x"

      # Cache NPM dependencies
      - name: Cache NPM Dependencies
        id: npm-cache
        uses: actions/cache@v2
        with:
          path: ~/.npm
          key: The ${{ runner.os }}-node-${{ hashFiles('./site/package-lock.json') }}
          restore-keys: | ${{ runner.os }}-node-
      # Install NPM dependencies
      - name: Install NPM Dependencies
        working-directory: ./site
        run: npm install

      # building Hexo
      - name: Build Hexo
        working-directory: ./site # Specify the working directory, only for the run command
        run: npm run build

      # Release Github Pages
      - name: Deploy Github Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: The ${{ secrets.GITHUB_TOKEN }}
          publish_dir: ./site/public # specify the path address to publish
          publish_branch: gh-pages # specify the remote branch name
Copy the code

The Workflow file has many configuration fields, and detailed comments are provided in the configuration file. Next, let’s focus on the following important configurations:

How a workflow is triggered

Set the workflow trigger mode.
on:
  # Timer trigger, update every 2 hours between 7:00 and 23:00 (https://crontab.guru)
  schedule:
    - cron: "0 7-23/2 * * *"

  Allow manual triggering of workflow
  workflow_dispatch:
Copy the code

We can use the ON workflow syntax to configure the workflow to run for one or more events. Automatic and manual triggering modes are supported. Schedule events allow us to trigger workflows at scheduled times, and we can use POSIX CRon syntax to schedule workflows to run at specific times.

The scheduled task syntax has five fields separated by Spaces, each representing a unit of time:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ minute (0-59) │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ the adrenaline-charged (0-23) │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ day of the month (1-31) │ │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ the month (1-12 or JAN - DEC) │ │ │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ day of the week (0-6 - SAT or SUN) │ │ │ │ │ │ │ │ │ │ │ │ │ │ * * * * *Copy the code

You can use crontab. Guru to generate scheduled task syntax, and you can check out more crontab Guru examples.

Alternatively, you can manually trigger the workflow by configuring the workFLOW_DISPATCH and Repository_DISPATCH fields.

The ON field can also be configured as push, that is, the warehouse will trigger the execution of workflow when there is a push operation. For detailed triggering workflow configuration, you can view the configuration workflow events.

Sequence of steps

From the configuration file, we can see that a continuous integration run of the project consists of the following steps:

Check out repository –> Set Python environment –> cache PIP dependencies –> Install PIP dependencies –> Run crawler script –> get current time –> Commit changes –> Push remote –> Set Node.js environment –> cache NPM Dependencies –> Install NPM dependencies –> build Hexo –> Publish Github Pages

Workflow of this project has the following main points:

  • Runtime environment: The entire workflow runs in the virtual environment Ubuntu-Latest. You can also specify other virtual environments, such as Windows Server and macOS.

  • Cache dependencies: Cache dependencies to speed up the installation of dependencies. See the specific uses: caching dependencies to speed up workflow;

  • Get the current time: The concept of a step context is used in the commit Message of the subsequent commit change step to get the current time. We can specify an ID for step,

    . Outputs to get the information about the running step.

  • Build Hexo: execute Hexo generate to generate static web pages;

  • Authentication in the workflow: Authentication is required for the submit push and publish steps. GitHub provides a token that can be used to authenticate on behalf of GitHub Actions. All we need to do is create a token named GITHUB_TOKEN. The specific steps are as follows: Settings –> Developer Settings –> Personal Access tokens –> Generate new tokens — named GITHUB_TOKEN You can then authenticate in STEP by using ${{secrets.github_token}}.

More Action can be found on the Github marketplace.

conclusion

Finally, a quote from the blogger @anti-Japanese War live broadcast:

“We broadcast the war of Resistance not to stir up negative emotions such as hatred, but to moderately awaken amnesia, when we always remember the suffering, fear and humiliation suffered by our grandparents; When we appreciate how the ancestors put aside past differences and achieved national reconciliation when the nation was in danger, and when we see how the ancestors died calmly and generously, sacrificing their bodies for the nation, I believe we will have a more mature and rational thinking about the reality.”

Keep history in mind and forge ahead.

Don’t forget national humiliation, we are strong.