• When I am bored, I am used to reading Zhihu. However, there are not many new valuable contents, but more marketing accounts, promotion and some Zhihu live. As a result, thinking is inferior to brush oneself collect clip, a lot of excellent answer looked before long to forget actually, lie quietly in collect clip, have not been turned over all the time, besides although I collect frequency is not high, come down a few years accumulate also many, brush so rise also can kill many time, still can beautiful its name say review old and know new. Although the front end of the revision, but zhihu’s favorites are not so convenient to use. Make your own food and clothing.

The effect

  • Python crawler was used to crawl all my favorites, flask was used for back-end API and vuejs for front-end display, and the front and back ends were separated. The results are as follows




Computer Effects 1





Computer Effects 2





Computer Effects 3





Effect of mobile phone

The crawler

  • At first, I thought there were many open source zhihu crawlers on Github, which could save a lot of trouble. However, WHEN I looked for them, most of them were no longer maintained, and the version of Zhihu was revised again. The new project has some features, but its functions are not perfect. (Using python3)
  • For this requirement, the logic of the crawler is simple. Zhihu can directly post user name and password on personal common computers without verification code to log in. Request.Session is used to save the request status. The page number rule of Page =num simply crawls all favorites pages, parses the urls of all favorites, and then requests the q&A list of all favorites in turn, parses the relevant information. As there is not much content, I directly save it as a JSON file for convenience. And because there aren’t many favorites, you can simply use the Requests library for a single-thread crawl.
  • The following is the crawler code, generating two JSON files, one is all the favorites and the related information of the following question and answerJsonOne is the answer data for all the questionsurl_answer.jsonIn this way, the front-end request can get the former first, and then asynchronous request the latter when you want to read the answer to a question, only get the corresponding answer.
  • The use of the Requests_cache library, just two lines of code, allows requests to be restarted after an unexpected interruption by extracting the requested page directly from the cache database, saving time and the trouble of coding your own failed requests.
import os
import json
from bs4 import BeautifulSoup
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
# reference http://stackoverflow.com/questions/27981545/suppress-insecurerequestwarning-unverified-https-request-is-being-made-in-py tho
import requests_cache
requests_cache.install_cache('demo_cache')


Cookie_FilePlace = r'.'
Default_Header = {'User-Agent': "Mozilla / 5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36".'Host': "www.zhihu.com".'Origin': "http://www.zhihu.com".'Pragma': "no-cache".'Referer': "http://www.zhihu.com/"}
Zhihu_URL = 'https://www.zhihu.com'
Login_URL = Zhihu_URL + '/login/email'
Profile_URL = 'https://www.zhihu.com/settings/profile'
Collection_URL = 'https://www.zhihu.com/collection/%d'
Cookie_Name = 'cookies.json'

os.chdir(Cookie_FilePlace)

r = requests.Session()

#--------------------Prepare--------------------------------#
r.headers.update(Default_Header)
if os.path.isfile(Cookie_Name):
    with open(Cookie_Name, 'r') as f:
        cookies = json.load(f)
        r.cookies.update(cookies)

def login(r):
    print('====== zhihu login =====')
    email = input('email: ')
    password = input("password: ")
    print('====== logging.... = = = = = ')
    data = {'email': email, 'password': password, 'remember_me': 'true'}
    value = r.post(Login_URL, data=data).json()
    print('====== result:', value['r'].The '-', value['msg'])
    if int(value['r'= =])0:
        with open(Cookie_Name, 'w') as f:
            json.dump(r.cookies.get_dict(), f)

def isLogin(r):
    url = Profile_URL
    value = r.get(url, allow_redirects=False, verify=False)
    status_code = int(value.status_code)
    if status_code == 301 or status_code == 302:
        print("Not logged in")
        return False
    elif status_code == 200:
        return True
    else:
        print(U "Network failure")
        return False

if not isLogin(r):
    login(r)


# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - #
url_answer_dict= {}
# generate a separate dictionary between the URL of the answer and the text of the answer to facilitate the backend API service, related to line 123

#-----------------------get collections-------------------------------#
def getCollectionsList(a):
    collections_list = []
    content = r.get(Profile_URL).content
    soup = BeautifulSoup(content, 'lxml')
    own_collections_url = 'http://' + soup.select('#js-url-preview') [0].text + '/collections'
    page_num = 0
    while True:
        page_num += 1
        url = own_collections_url + '? page=%d'% page_num
        content = r.get(url).content
        soup = BeautifulSoup(content, 'lxml')
        data = soup.select_one('#data').attrs['data-state']
        collections_dict_raw = json.loads(data)['entities'] ['favlists'].values()
        if not collections_dict_raw: 
        # if len(collections_dict_raw) == 0:
            break
        for i in collections_dict_raw:
            # print(i['id'],' -- ', i['title'])
            collections_list.append({
                'title': i['title'].'url': Collection_URL % i['id'],
            })
    print('====== prepare Collections Done =====')
    return collections_list

# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
def getQaDictListFromOneCollection(collection_url = 'https://www.zhihu.com/collection/71534108'):
    qa_dict_list = []
    page_num = 0
    while True:
        page_num += 1
        url = collection_url + '? page=%d'% page_num
        content = r.get(url).content
        soup = BeautifulSoup(content, 'lxml')
        titles = soup.select('.zm-item-title a') # .text ; ['href']
        if len(titles) == 0:
            break
        votes = soup.select('.js-vote-count') # .text 
        answer_urls = soup.select('.toggle-expand') # ['href']
        answers = soup.select('textarea') # .text
        authors = soup.select('.author-link-line .author-link') # .text ; ['href']
        for title, vote, answer_url, answer, author \
        in zip(titles, votes, answer_urls, answers, authors):
            author_img = getAthorImage(author['href'])
            qa_dict_list.append({
                'title': title.text,
                'question_url': title['href'].'answer_vote': vote.text,
                'answer_url': answer_url['href'].#'answer': answer.text,
                'author': author.text,
                'author_url': author['href'].'author_img': author_img,
            })
            url_answer_dict[ 
                answer_url['href'] [1:] 
            ] = answer.text
            # print(title.text, ' - ', author.text)
    return qa_dict_list

def getAthorImage(author_url):
    url = Zhihu_URL+author_url
    content = r.get(url).content
    soup = BeautifulSoup(content, 'lxml')
    return soup.select_one('.AuthorInfo-avatar') ['src']

def getAllQaDictList(a):
    The end result is a nested list and dictionary for the front end to parse.
    all_qa_dict_list = []
    collections_list = getCollectionsList()
    for collection in collections_list:
        all_qa_dict_list.append({
            'ctitle': collection['title'].'clist': getQaDictListFromOneCollection(collection['url'])
        })
        print('====== getQa from %s Done =====' % collection['title'])
    return all_qa_dict_list


with open(U 'json'.'w', encoding='utf-8') as f:
    json.dump(getAllQaDictList(), f)

with open(u'url_answer.json'.'w', encoding='utf-8') as f:
    json.dump(url_answer_dict, f)
#---------------------utils------------------------------#
# with open('1.html', 'w', encoding='utf-8') as f:
    # f.write(soup.prettify())
# import os
# Cookie_FilePlace = r'.'
# os.chdir(Cookie_FilePlace)
# import json
# dict_ = {}
# with open(u' json', 'r', encoding=' utF-8 ') as f:
# dict_ = json.load(f)Copy the code

The front end

  • Front-end requirements are not high, is a single page display, simple and beautiful, and to facilitate me to find and flip through the questions and answers. Secondly, for me, the HTML and CSS battle five slag, JS list traversal code must be Google, it must be simple and easy to operate, I choose VUejS front-end framework (because it is simple, do not use WebPack).
  • Front-end development soon, frameworks and tools to let people overwhelmed, look from my personal experience, the first is don’t be afraid, frameworks and tools to help us to solve the problem, that is, using frameworks and tools can make us more simple and faster development, many effective framework and tools of learning cost is not high, mastered the basics, and use of open source code, Many problems can be solved conveniently. In addition, collecting good tools is a skill that everyone has to have to deal with the same problems. Someone may have developed a tool to solve your pain point.
  • Firstly, the basic composition of the website adopts a basic template of Bootstrap, which saves a lot of trouble. The componentry nature of Vuejs makes it easy for me to take various open source UI components and put them together like building blocks to form my pages. On awesome-Vue, I found iView, a UI framework that fits my aesthetic and is simple to use. Although it only works with VU1.x for now, it’s not that different due to the simplicity of my application.
  • The following is HTML code that uses vue-Resource to asynchronously request data and synchronize it to the page. For the convenience of development, jSONP cross-domain request is adopted directly. Code quality is for reference only. The template in the component is not easy to view, you can copy it out and use the tool to beautify the HTML code by removing the single quotes and escaping the single quotes. It is expedient to write so.

       
<html lang="zh-CN">

<! --view-source:http://v3.bootcss.com/examples/jumbotron-narrow/#-->

<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Zhihu personal collection</title>
    <link rel="stylesheet" href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css">
    <link rel="stylesheet" href="http://v3.bootcss.com/examples/jumbotron-narrow/jumbotron-narrow.css">
    <link rel="stylesheet" type="text/css" href="http://unpkg.com/iview/dist/styles/iview.css">
</head>

<body>
    <div id="app">
        <div class="container">
            <div class="header clearfix">
                <h3 class="text-muted">Zhihu personal collection</h3>
            </div>
            <div class="jumbotron">
                <h1>Program overview</h1>
                <p class="lead">{{ description }}</p>
                <my-carousel></my-carousel>
            </div>
            <div class="row marketing">
                <div class="col-lg-6">
                    <my-card :collection="collection" v-for="collection in left"></my-card>
                </div>
                <div class="col-lg-6">
                    <my-card :collection="collection" v-for="collection in right"></my-card>
                </div>
            </div>
            <i-button @click="showLeave" style: "" long>That's all!</i-button>
            <Modal :visible.sync="visible" :title="modalTitle"> {{ modalMessage }}
                <div v-html="rawHtml" id="inner-content"></div>
            </Modal>

            <footer class="footer">
                <p>&copy; 2017 treelake.</p>
            </footer>
        </div>
        <! -- /container -->
    </div>

    <script type="text/javascript" src="http://v1.vuejs.org/js/vue.min.js"></script>
    <script src="https://cdn.jsdelivr.net/vue.resource/1.2.0/vue-resource.min.js"></script>
    <script type="text/javascript" src="http://unpkg.com/iview/dist/iview.min.js"></script>

    <script>
        Vue.component('my-carousel', {
            template: '
        
! [](https://n2-s.mafengwo.net/fl_progressive,q_mini/s10/M00/74/B6/wKgBZ1irpQ-Afw_uAAepw3nE8w884.jpeg) ! [](https://c4-q.mafengwo.net/s10/M00/21/50/wKgBZ1imrvqAafuJAAeRHcfhBBg66.jpeg?imageMogr2%2Finterlace%2F1)
'
}) Vue.component('my-ul', { template: ' '.props: ['items'].methods: { changeLimit() { if (this.limitFrom > this.items.length - this.limitNum) { this.limitFrom = 0; } else { this.limitFrom += this.limitNum; } if (this.limitFrom == this.items.length) { this.limitFrom = 0 } console.log(this.limitFrom) }, simpleContent(msg) { this.$dispatch('child-msg', msg) // Dispatches events with $dispatch(), which bubbles along the parent chain }, }, data() { return { limitNum: 5.limitFrom: 0,}},events: { 'parent-msg': function () { this.changeLimit() } }, }) Vue.component('my-card', { template: '

{{ collection.ctitle }}

'
.props: ['collection'].methods: { notify: function () { this.$broadcast('parent-msg') // Use $broadcast() to broadcast events that are propagated downward to all descendants}}})var shuju, answer; new Vue({ el: '#app'.data: { description: ' '.visible: false.// ctitle: '', allqa: [], collection: { 'clist': [].'ctitle': ' ',},left: [].right: [].modalMessage: 'The old days are over! '.modalTitle: 'Welcome! '.rawHtml: '<a href="https://treeinlake.github.io"> treelake </a>' }, methods: { show() { this.visible = true; }, showLeave() { this.rawHtml = ' '; this.modalMessage = 'The old days are over! '; this.show(); }},events: { 'child-msg': function (msg) { this.$http.jsonp('/find' + msg.answer_url, {}, { / / the single file test: http://localhost:5000/find headers: {}, emulateJSON: true }).then(function (response) { // Handle the correct callback answer = response.data; this.rawHtml = answer.answer; }, function (response) { // This is a callback that handles an error console.log(response); }); this.modalMessage = ' '; this.modalTitle = msg.title; this.show(); }},ready: function () { this.$http.jsonp('/collections', {}, { / / http://localhost:5000/collections/ single file test headers: {}, emulateJSON: true }).then(function (response) { // Handle the correct callback shuju = response.data for (i in shuju) { this.description += (shuju[i].ctitle + ' '); // console.log(shuju[i]) } // this.ctitle = shuju[0].ctitle // this.collection = shuju[0] this.allqa = shuju half = parseInt(shuju.length / 2) + 1 this.left = shuju.slice(0, half) this.right = shuju.slice(half, shuju.length) console.log(this.collection) }, function (response) { // This is a callback that handles an error console.log(response); }); }})
</script> <style> #list { padding: 10px } #list li { margin-bottom: 10px; padding-bottom: 10px; } .jumbotron img { width: 100%; } .author-badge { width: 38px; height: 38px; border-radius: 6px; display: inline-block; } #inner-content img { width: 100%; } </style> </body> </html>Copy the code

The back-end

  • The back end mainly provides apis, which are simple and easy to useFlaskBut returning JSONP requires a layer of encapsulation, but the open source world is powerful enough to find it directlyFlask-JsonpifyLibrary, one sentence. The main logic is to load the previously climbed data locally and then provide API services./find/<path:answer_url>Routing provides the service of finding the text content of a reply based on the URL of the reply.
  • Finally, to get Flask to serve HTML files in the root directory, direct access to the IP can be used on the phone. In order to avoid flask’s template rendering conflicts with Vuejs’s template features, the original HTML files are returned, avoiding Flask’s template rendering.
  • The following is the server code, together with the above two files, after the data is crawled,python xxx.pyJust run the service.
# -*- coding: utf-8 -*-
from flask import Flask
import json
from flask_jsonpify import jsonpify


app = Flask(__name__)

collections = []
with open(U 'json'.'r', encoding='utf-8') as f:
    collections = json.load(f)

qa_dict = {}
with open('url_answer.json'.'r', encoding='utf-8') as f:
    qa_dict = json.load(f)
# print(qa_dict['question/31116099/answer/116025931'])

index_html = ' '
with open('zhihuCollection.html'.'r', encoding='utf-8') as f:
    index_html = f.read()


@app.route('/')
def index(a):
    return index_html


@app.route('/collections')
def collectionsApi(a):
    return jsonpify(collections)


@ app. The route ('/find / < path: answer_url > ') # use path correction, the side effects of slash see http://flask.pocoo.org/snippets/76/
def answersApi(answer_url):
    # show the post with the given id, the id is an integer
    return jsonpify({'answer': qa_dict[answer_url]})


@app.route('/test')
def test(a):
    # show the post with the given id, the id is an integer
    return jsonpify(qa_dict)


if __name__ == '__main__':
    app.run(host='0.0.0.0')Copy the code