Don’t waste your time. Go to the table of contents and read selectively

  • 1. Crawl introduction
  • 2. Code presentation
  • 3. Results presentation
  • 4. Benefit sharing

Batch crawl answers

The soul of torture

  • How long has it been since you read a book? Do you know what is the best book to read?
  • When’s the last time you saw a movie? Do you know what movie is worth seeing?

Someone said, I know you can go to see the douban ratings, according to the score and choosing books, but tend to score higher transmission is not easy to read, such as high score is not necessarily easy to understand, such as “quantum mechanics” is introduced in the figure below, score of 9.6, the five-star high praise, it’s a pity that busy work time prevented me to communicate with its IQ (not)

Quantum Mechanics is a 9.6

At this time, I thought of the Zhihu platform where the average degree is 985 and the average annual salary is one million yuan. The book list recommended by talents everywhere here must be correct. In zhihu search related books recommended, but the zhihu recommended move thousands of answer, don’t know who look for the answer is good, or very few answers may not have reference, so in the line of elder brother want to be if I know all the answers to climb down, choose to recommend the most books or movies to sublimate their precision sublimation with


Therefore, brother Xing took 0.1 seconds to think of the crawler, used 10,000 seconds to write the code, crawled the zhihu under 5646 answers, equivalent to 5646 million annual salary millionaires gave my book list to vote. After obtaining the data, high-frequency ranking was carried out to show the charts and check the top ten books on the list which had read and which had not read, and to see that the gap between my annual salary and Zhihu’s million was only a few books away

no bb show your code

Here is the core code display, which is mainly to pass the question number of Zhihu into the function as a parameter, and then conduct batch crawling answers, use regular expression to extract the book name or movie name containing the book name in each answer, and finally save in CSV.

  • Code is too long, need to complete the source can see the end of the welfare to share
# Public id: a line of data

All source code, Python data, and ai data can be retrieved in one line

def getAnswers(qid):

    Get all books and answer data

    offset = 0

    book_data = {}

    while True:

        qid = qid

        print('Offset =', offset)

        # Zhihu API request

        url = "https://www.zhihu.com/api/v4/questions/{}/answers?include=content&limit=20&offset={}&platform=desktop&sort_by=default".format(

            qid, offset)

        res = requests.get(url, headers=headers)

        res.encoding = 'utf-8'

        data = res.json()

        if len(data['data'= =])0:

            break

        for line in data['data'] :

            # Save the answer data

            content = line['content']

            result = re.findall(R '" (. *?)" ', content)

            for name in result:

                book_data[name] = book_data.get(name, 0) + 1

        offset += 20

    Save the contents of the crawl

    for i in book_data.keys():

        new_data = {}

        if i:

            new_data['Book Name'] = i

            new_data['frequency'] = book_data[i]

            pandas_data.append(new_data)

    df2 = pd.DataFrame(pandas_data, columns=['Book Name'.'frequency'])

    df2.to_csv("book.csv",encoding="utf_8_sig")

Copy the code

The results show

A total of 6,434 books were recommended from 5,464 responses, of which “To Live” was recommended 286 times.

Line elder brother can not help but sigh, these annual salary million big guy incredibly also want to consider the same problem with line elder brother – alive, it seems to be a step closer to the annual salary million. I don’t know how many books you’re short of a million dollars a yearIf you have read all the top 10 books, look at the list of the first 11 to 20 books. If you have read all of these books, you can directly check the list of 6343 books at the end of this letter. These books are confirmed by Xingge and can be read for free on Wechat Reader, so there is no need to worry about finding PDF versions.

Benefits to share

  • What? Top 20 all read, that background reply [complete edition book list], see you and millions of big guy’s real gap
  • What? Do not want to change the code, need the full version of the code, click [read the text] to have it
  • What? If you want to climb more content, add my friend [data_ecology], you just ask, I just do

Have you found, this is you from the annual salary million of the latest time, line elder brother can only help this, if you feel good, you can point [good-looking] oh, next line elder brother will package this program into an application, let you follow one’s inclinations climb zhihu