Introduction to the

When I browse the Web, I sometimes browse Github Trending, Hacker News, rare-earth Nuggets and other tech communities or articles, but find it time-consuming and inflexible to go through them one by one. Later, I found that there is a foreign product called Panda, which aggregates information in most areas of the Internet and is really good to use. The only pity is that there is no information in Chinese on the Internet, so I came up with an idea: write a crawler to crawl down and display the information of websites I often read.

Once you have an idea, it’s just a matter of how to implement it. Flask + React + Redux stack was used to try React. Among them:

  • Flask is used to provide API services in the background
  • React is used to build the UI
  • Redux is used for data flow management

At present, the project has realized the basic function, project source code: Github address. The current interface looks like this:


The front-end development

React and Redux are used as the View layer and Redux as the Model layer.

React+Redux React+Redux React+Redux

We can see that the data flow is one-way:

Store -> View Layer -> Action -> Reducer
^ |
| |
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- returns the new State -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Copy the code

Among them:

  • React provides the View layer for applications, which consists of components, including container components and common display components.
  • Redux consists of three parts: Action, Reducer, and Store:
    • An Action is essentially a JS object that requires at least one element: Type, which identifies the Action;
    • The Middleware (Middleware) is used to do some operations, such as asynchronous actions and Api requests, after actions are initiated and before actions arrive at Reducer.
    • Reducer is a function:(previousState, action) => newState, can be understood as the processing center of actions, processes various actions and generates new states, which are returned to Store;
    • Store is the state management center of the entire application. Container components can obtain the required states from the Store.

The source code for the front end of the project is in the client directory. Here are some main directories:

Heavy Exercises ─ Actions #
├─ Components # Common Display Components
├─ containers # Components
├── Middleware # for API requests
├── Reducers # reducer
├─ store # Store Config file
Copy the code

The React development

React part development mainly involves container and Component:

  • Container is responsible for receiving states and sending actions in the Store.
  • Components are located inside a Container. Instead of connecting directly to a Store, they get props from the parent container. All operations are done through callbacks.

In this project, the prototype of Container is as follows:


There are two components: a selection component and an information display component, as follows:

Select the component Information display unit

These components will be used multiple times.

Next, let’s look at the code for the container component (corresponding to app.js) (only the important code is shown) :

import React, { Component, PropTypes } from 'react';
import { connect } from 'react-redux';
import Posts from '.. /.. /components/Posts/Posts';
import Picker from '.. /.. /components/Picker/Picker';
import { fetchNews, selectItem } from '.. /.. /actions';
require( './App.scss');
class App extends Component {
constructor(props) {
this.handleChange = this.handleChange.bind( this);
componentDidMount() {
for ( const value of this.props.selectors) {
this.props.dispatch(fetchNews(value.item, value.boardId));
componentWillReceiveProps(nextProps) {
for ( const value of nextProps.selectors) {
if(value.item ! = = this.props.selectors[value.boardId].item) {
nextProps.dispatch(fetchNews(value.item, value.boardId));
handleChange(nextItem, id) {
this.props.dispatch(selectItem(nextItem, id));
render() {
const boards = [];
for ( const value of this.props.selectors) {
const options = [ 'Github'. 'Hacker News'. 'Segment Fault'. 'Developer Headlines'. Bole Toutiao];
return (
<div className="mega">
<div className="desk-container">
{, i) =>
<div className="desk" style={{ opacity: 1 }} key={i}>
<Picker value={this.props.selectors[board].item}
function mapStateToProps(state) {
return {
selectors: state.selectors,
export default connect(mapStateToProps)(App);
Copy the code

Among them,

  • constructor(props)Is a constructor that is called once when the component is created;
  • componentDidMount()This method is called once after the component has loaded;
  • componentWillReceiveProps()This method is executed when the component receives a new prop;

These functions are the React Component lifecycle functions. See here for more component lifecycle functions.

  • react-reduxAs the name suggests, this library is used to connect react and Redux, i.e., to container components and stores.
  • mapStateToPropsThis function is used to establish a mapping between the (external) state object and the PROPS object of the UI component. It subscribes to state in the Store, and when state is updated, it automatically recalcates the PARAMETERS of the UI component, triggering a re-rendering of the UI component.

Story development

As mentioned above, the Redux part of development mainly includes: Action, Reducer, and Store. Store is the state management center of the application. When a new state is received, component re-rendering is triggered. Reducer is the action processing center of the application, which processes the action and generates the new state and returns it to the store.

In this project, there are two actions, one for site selection (Github, Hacker News) and the other for information fetch. Part of the action code is as follows:

export const FETCH_NEWS = 'FETCH_NEWS';
export const SELECT_ITEM = 'SELECT_ITEM';
export function selectItem(item, id) {
return {
export function fetchNews(item, id) {
switch (item) {
case 'Github':
return {
api: `/api/github/repo_list`.
method: 'GET'.
case 'Segment Fault':
return {
api: `/api/segmentfault/blogs`.
method: 'GET'.
return {};
Copy the code

As you can see, an action is a normal JS object that must have a type attribute to identify the action.

Reducer is a function with a switch that accepts the current state and action as parameters and returns a new state, for example:

import { SELECT_ITEM } from '.. /actions';
import _ from 'lodash';
const initialState = [
item: 'Github'.
boardId: 0.
item: 'Hacker News'.
boardId: 1.
export default function reducer(state = initialState, action = {}) {
switch (action.type) {
return _.sortBy([
item: action.item,
. state.filter( element= >
element.boardId ! ==
]. 'boardId');
return state;
Copy the code

Take a look at store:

import { createStore, applyMiddleware, compose } from 'redux';
import thunk from 'redux-thunk';
import api from '.. /middleware/api';
import rootReducer from '.. /reducers';
const finalCreateStore = compose(
export default function configureStore(initialState) {
return finalCreateStore(rootReducer, initialState);
Copy the code

ApplyMiddleware () is used to tell Redux what middleware to use, such as Thunk middleware for asynchronous operations and middleware we wrote for API requests.

The backend development

The development of the back end is mainly the crawler, the current crawler is relatively simple, basically static page crawler, mainly is THE HTML parsing and extraction. In order to access websites such as rare-earth digging and Zhihu column, mechanisms such as login verification and anti-crawler protection may be involved, which will be further developed in the future.

The backend code is in the server directory:

├ ─ ─ just set py
├─ # Create an App
├─ # Config file
├─ Controllers # Provide API service
└─ spiders folder, several sites of crawlers
Copy the code

The back end provides data to the front end in the form of an API through Flask. Here is the code:

# -*- coding: utf-8 -*-
import flask
from flask import jsonify
from server.spiders.github_trend import GitHubTrend
from server.spiders.toutiao import Toutiao
from server.spiders.segmentfault import SegmentFault
from server.spiders.jobbole import Jobbole
news_bp = flask.Blueprint(
url_prefix= '/api'
@news_bp.route('/github/repo_list', methods=['GET'])
def get_github_trend(a):
gh_trend = GitHubTrend()
gh_trend_list = gh_trend.get_trend_list()
return jsonify(
message= 'OK'.
@news_bp.route('/toutiao/posts', methods=['GET'])
def get_toutiao_posts(a):
toutiao = Toutiao()
post_list = toutiao.get_posts()
return jsonify(
message= 'OK'.
@news_bp.route('/segmentfault/blogs', methods=['GET'])
def get_segmentfault_blogs(a):
sf = SegmentFault()
blogs = sf.get_blogs()
return jsonify(
message= 'OK'.
@news_bp.route('/jobbole/news', methods=['GET'])
def get_jobbole_news(a):
jobbole = Jobbole()
blogs = jobbole.get_news()
return jsonify(
message= 'OK'.
Copy the code

The deployment of

The deployment of this project adopts nginx+ Gunicorn + Supervisor mode, wherein:

  • Nginx is used as a reverse proxy server. It receives the Internet connection request, forwards the request to the target server on the Intranet, and then returns the result from the target server to the client (such as the browser) requesting the Internet connection.
  • Gunicorn is a highly efficient Python WSGI Server that we use to run WSGI (Web Server Gateway Interface) applications (such as the Flask application for this project).
  • Supervisor is a process management tool that makes it easy to start, shut down, restart processes, etc.

The files required for project deployment are in the deploy directory:

├─ # Automatic Deployment Script
├── Nginx. conf # General Configuration file for Nginx
├── Nginx_geekvi.conf # Site Configuration file
├ ─ └─ supervisor. Conf # Supervisor configuration file
Copy the code

This project uses Fabric automatic deployment, which allows us to perform remote operations locally, such as installing software, deleting files, etc., without directly logging in to the server.

Part of the file looks like this:

# -*- coding: utf-8 -*-
import os
from contextlib import contextmanager
from fabric.api import run, env, sudo, prefix, cd, settings, local, lcd
from fabric.colors import green, blue
from fabric.contrib.files import exists
env.hosts = [ '[email protected]:12345']
env.key_filename = '~/.ssh/id_rsa'
# env.password = '12345678'
# path on server
DEPLOY_DIR = '/home/deploy/www'
PROJECT_DIR = os.path.join(DEPLOY_DIR, 'react-news-board')
CONFIG_DIR = os.path.join(PROJECT_DIR, 'deploy')
LOG_DIR = os.path.join(DEPLOY_DIR, 'logs')
VENV_DIR = os.path.join(DEPLOY_DIR, 'venv')
VENV_PATH = os.path.join(VENV_DIR, 'bin/activate')
# path on local
PROJECT_LOCAL_DIR = '/Users/Ethan/Documents/Code/react-news-board'
def source_virtualenv(a):
with prefix( "source {}".format(VENV_PATH)):
def build(a):
with lcd( "{}/client".format(PROJECT_LOCAL_DIR)):
local( "npm run build")
def deploy(a):
print green( "Start to Deploy the Project")
print green( "=" * 40)
# 1. Create directory
print blue( "create the deploy directory")
print blue( "*" * 40)
# 2. Get source code
print blue( "get the source code from remote")
print blue( "*" * 40)
with cd(DEPLOY_DIR):
with settings(warn_only= True) :
run( "git clone {}".format(GITHUB_PATH))
# 3. Install python virtualenv
print blue( "install the virtualenv")
print blue( "*" * 40)
sudo( "apt-get install python-virtualenv")
# 4. Install nginx
print blue( "install the nginx")
print blue( "*" * 40)
sudo( "apt-get install nginx")
sudo( "cp {}/nginx.conf /etc/nginx/".format(CONFIG_DIR))
sudo( "cp {}/nginx_geekvi.conf /etc/nginx/sites-enabled/".format(CONFIG_DIR))
# 5. Install python requirements
with cd(DEPLOY_DIR):
if not exists(VENV_DIR):
run( "virtualenv {}".format(VENV_DIR))
with settings(warn_only= True) :
with source_virtualenv():
sudo( "pip install -r {}/requirements.txt".format(PROJECT_DIR))
# 6. Config supervisor
sudo( "supervisord -c {}/supervisor.conf".format(CONFIG_DIR))
sudo( "supervisorctl -c {}/supervisor.conf reload".format(CONFIG_DIR))
sudo( "supervisorctl -c {}/supervisor.conf status".format(CONFIG_DIR))
sudo( "supervisorctl -c {}/supervisor.conf start all".format(CONFIG_DIR))
Copy the code

Env.hosts specifies the remote server, and env.key_filename specifies the path to the private key so that we can log in to the server without a password. Modify the above parameters according to the actual situation, such as server address, user name, server port and project path, etc., can be used. Note that we should load and build the front-end resources before deploying, using the following command in the deploy directory:

$ fab build
Copy the code

Of course, you can also go directly to the client directory and run the command:

$ npm run build
Copy the code

If the build doesn’t go wrong, you’re ready to deploy, using the following command in the deploy directory:

$ fab deploy
Copy the code


  • This project uses React+Redux for the front end and Flask for the back end, which is a typical development method. Of course, you can also use Node.js for the back end.

  • Front-end development needs to know where the data is going:


  • The development of the back end is mainly crawler, Flask in this project is only as a background framework, external API services;

The resources