Data Science Club

Chinese data scientist community

Author: John, formerly of Morgan Stanley and eBay.

\

It’s very simple, and in this article we will reveal the principle behind it.

What are the characteristics of zhihu god’s reply? So let’s take a look


\

Do you see any patterns? Short and pithy? Agree with a lot? So we just have to crawl the answers that are more in agreement and less in words. Simple two steps can be achieved, the first step to crawl zhihu answers, the second part of the screening of answers. Isn’t it easy?

\

Climb to know hu answer

The first step is to get answers from Zhihu. There are so many answers on Zhihu that it will take a long time to crawl all the answers at once. We can choose several topics and crawl the content of these topics. The following function is used to crawl the content of a given topic

def get_answers_by_page(topic_id, page_no) :
    offset = page_no * 10
    url = <topic_url> # topic_URL is the url for this topic
    headers = {
        "User-Agent": "Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
    }
    r = requests.get(url, verify=False, headers=headers)
    content = r.content.decode("utf-8")
    data = json.loads(content)
    is_end = data["paging"] ["is_end"]
    items = data["data"]
    client = pymongo.MongoClient()
    db = client["zhihu"]
    if len(items) > 0:
        db.answers.insert_many(items)
        db.saved_topics.insert({"topic_id": topic_id, "page_no": page_no})
    return is_end
Copy the code

The get_answers_by_page function takes two arguments, the first being the id of the topic, and the second indicating the page to climb.

There are a few fields to be aware of in the climb down, highlighted in yellow in the figure below



These fields have the following meanings:

  • Question. Title – The title of the question
  • Content – The content of the answer
  • Voteup_count – Number of endorsements

These fields will be used in the next step of filtering responses.

\

Filter answer \

After climbing the data, let’s sift through the results. We filtered the answers using the Aggregation Pipeline in MongoDB (see the Aggregation Pipeline Quick Reference article on how to use the MongoDB Aggregation Pipeline, Address at https://docs.mongodb.com/manual/meta/aggregation-quick-reference/), the code is as follows

client = pymongo.MongoClient()
db = client["zhihu"]
items = db.answers.aggregate([
    {"$match": {"target.type": "answer"}},
    {"$match": {"target.voteup_count": {"$gte": 1000}}},
    {"$addFields": {"answer_len": {"$strLenCP": "$target.content"}}},
    {"$match": {"answer_len": {"$lte": 50}}},])
Copy the code

The code above filters all responses that agree with more than 1,000 words and fewer than 50 words, and the result is a short, pithy-like response.

The above is part of the core code, due to limited space, you can reply “Zhihu” in the public account data science club backstage to obtain the complete code download address.

God replied

So we’re done with the code, so let’s run it. It happened to be programmer’s Day yesterday, so we sifted through some of the most amazing replies about programmers. And here’s what happened. 75 funny jokes? \

1

\

Q: What are the most common “lies” that code farmers tell?

\

A: //TODO

\

2

\

Q: What’s it like to be green for 365 days on GitHub?

\

A:

Once maintained the whole green for more than 200 days, but neglected his girlfriend, has been green until now.

\

\

3

\

Q: How to refute the idea that programmers are useless without computers?

\

A: No, no, no. Many programmers are useless in front of computers.

\

\

4

\

Q: What would happen if one day everyone spoke in a computer language?

\

A:

�d}� R�0:�v�? .

\

\

5

\

Q: SUDDENLY want to open a programmer theme restaurant, the name is called programmers dishes, dishes called keywords in various languages, please give me some advice, have a future?

\

A: A big Hello World dish called “Braised Product Manager” is sure to fill the room

\

\

6

\

Q: What is recursion?

\

A: The definition and category of “political content not suitable for public discussion” itself also belongs to “political content not suitable for public discussion”.

\

\

7

\

Q: How to translate the most basic programming term “bug”?

\

A:

Shit. Your program screwed up again.

\

\

8

\

Q: What’s fun about programming?

\

A: Man’s sense of achievement comes from two things, creation and destruction.

\

\

9

\

Q: How to refute the idea that programmers are useless without computers?

\

A: Honestly, if you can talk to A woman like this, do you want to fuck her?

\

\

10

\

Q: As a programmer, what math problems do you suffer from?

\

A: An “apparently” pushed me all afternoon while reading the paper

\

\

11

\

Q: What are the devices of rich programmers?

\

A: Girlfriend…

\

\

12

\

Q: Which god do you worship to keep your code bug free?

\

A: Worship Yongzheng, the eight elder brothers.

\

\

13

\

Q: Is IT the only way for poor Chinese children to rise to the middle class by going to a good university and studying IT?

\

A: Yes, there are four ways to write code and finance in the code circle

\

\

14

\

Q: Why do programmers like to carry computer bags wherever they go, even if there is no computer in them?

\

A: Because they don’t have any other bags.

\

\

15

\

Q: Talk is cheap. Show me the code.

\

A: Cut the bullshit and put the code in.

\

\

16

\

Q: Why do programmers’ girlfriends and wives tend to be much more attractive than their husbands? Or is it that programmers are already a quality stock in the dating market?

\

A: programmer girl friend appearance level is high, I am of, because casually ask 10 programmer his girl friend is who, have 9 answer is new wall knot clothes

\

\

17

\

Q: Why do some people prefer to buy mechanical keyboards instead of applying facial masks to themselves?

\

A:

I don’t live with my face.

My hard-earned cash. I can spend it any way I want.

\

\

18

\

Q: What word is good for programmer couple’s wedding ring?

\

A: 0 error 0 warning

\

\

19

\

Q: Do IT engineers feel uncomfortable when they are called “code farmers”?

\

A: We’re still human, and our products and designs are dogs…

\

\

20

\

Q: Why did a salesman (30 years old) invite me, a programmer (24 years old), to a nearby Starbucks?

\

A: Based on my years of experience, he should have A great idea and then only need programmers to implement it

\

\

21

\

Q: How can I find a girlfriend who likes programmers?

\

A: See fate, zhihu so many users, you pay attention to me is fate.

\

\

22

\

Q: How does a programmer’s girlfriend celebrate a programmer’s boyfriend’s birthday?

\

A: Tell him the interface is ready.

\

\

23

\

Q: How did you find a girlfriend while working as a programmer?

\

A: The subject as A programmer for so long, but also like girls has been commendable.

\

\

24

\

Q: What does a programmer need to do to prepare for a career in grilling? What are the advantages and disadvantages?

\

A: You see, you don’t even know the advantages and disadvantages of making your own barbecue, so you still need A product manager.

\

\

25

\

Q: What makes programmers tick?

\

A: I walked by his computer and said, “Oh, you’re writing A bug again!”

\

\

26

\

Q: one of my teachers said that Java is for large software and C# is for small and medium software. Is it true?

\

A: Java has A talent for turning small and medium software into large ones.

\

\

27

\

Q: Why are programmers paid so much in 2014?

\

A: The hourly wage is not very high

\

\

28

\

Q: Are most programmers complaining about low pay?

\

A:

Who, who is complaining about high wages?

\

\

29

\

Q: What if a bachelor app solves a technical problem and has no girl to show off or brag about?

\

A: Now you see why so many programmers write tech blogs.

\

\

30

\

Q: Do Chinese programmers prefer “jackets + jeans + sneakers”? If so, why the trend?

\

A: Do you want to show the program?

\

\

31

\

Q: What tools do you feel have greatly increased your productivity as an IT practitioner?

\

A:

single

\

\

32

\

Q: Why do I think programmers seem mostly inarticulate?

\

A:

Just think of us as having low eq,

So you’re happy,

We’re happy, too.

\

\

33

\

Q: In China, the oldest programmer is only about 40 years old. What can Chinese programmers do in the future?

\

A:

It’s the same principle why no one born in the 90s makes it past 30

\

\

34

\

Q: How do I reply to a text message from a programmer: “Hello world”?

\

A: hello nerd.

\

\

35

\

Q: How can you tell that an IT guy likes a girl?

\

A: When he goes all out of his way to get close to you with the habit of being silent that he has already developed

\

\

36

\

Q: Why shouldn’t programmers be able to fix computers?

\

A: Does Fan Bingbing need to be able to fix the TV?

\

\

37

\

Q: my colleague said that he was the best C++ player in China. How did he realize that he was not so good?

\

A:

To be honest, I am no pusher either: my C++ level ranks 0 in China.

\

\

38

\

Q: Why do all ICONS of iPhone delete apps have to be shaken?

\

A: The third-party software is scary, and the software built in the system is dither

\

\

39

\

Q: Revolver is loaded with one bullet, one shot at your head is worth 100,000 yuan, two shots are worth 100 million yuan, three shots are worth 200 million yuan, four shots are worth 400 million yuan, five shots are worth 1.6 million yuan.

\

A:

As long as not hit the vital, I tell you, I can hit our A station listed !!!!

\

\

40

\

Q: At the current rate of processor performance doubling every year, will iPhone processors soon catch up with and even surpass desktop processors?

\

A: When I was young, I always thought that in two years I would be as old as my brother who was two years older than me.

\

\

41

\

Q: What is the least benefit of Zhihu for you?

\

A: Killing time without feeling guilty.

\

\

42

\

Q: What are the anti-human technological inventions or designs?

\

A: The computer is not connected to the Internet. After the diagnosis, it reminds me to connect to the Internet

\

\

43

\

Q: Why don’t designers want to be called artists?

\

A: As long as the salary is high, you can call me auntie.

\

\

44

\

Q: Why do people think netease cloud music is the conscience of the industry?

\

A: One day it suddenly sent me A message that the lyrics I wanted were found

\

\

45

\

Q: Why are there no drone self-destruct weapons? Have terrorists used it?

\

A: You mean missiles?

\

\

46

\

Q: Since my thoughts are mine, why can’t I control my negative emotions sometimes?

\

A: The operating system will not allow users to access, modify, or delete core system files, because this will damage the system and cause abnormal operation.

\

\

47

\

Q: Lu Xun is great, but is he just one of the world’s top ten writers?

\

A: Why should writers pay for the ranks of illiterates?

\

\

48

\

Q: What technologies have reached the bottleneck and haven’t had a major breakthrough in a long time?

\

A: the boiling water

\

\

49

\

Q: How do you view the preference of some people to download software to the official website?

\

A: Have you ever won baidu Family bucket?

\

\

50

\

Q: Why do many people buy laptops to play games instead of more powerful desktops?

\

A: Because I can’t afford A house…

\

\

51

\

Q: How much did listening to headphones for the first time shock you?

\

A: They don’t shock you the first time you listen to them, but when you switch back to regular headphones, they do

\

\

52

\

Q: Is Chrome really a battery hog?

\

A: No. I’m using Chrome right now. As long as I’ve been using it, the battery of my laptop is still 50%

\

\

53

\

Q: What is the experience of using Windows on a MacBook?

\

A: It’s like suddenly you have A soft underbelly and you lose your armor.

\

\

54

\

Q: What is it like to use Apple products for all related products in your home?

\

A: There was A phone call and the whole family rang

\

\

55

\

Q: Why didn’t you buy the iPhone X?

\

A: The contradiction between the growing need for A better life and the reality of poverty

\

\

56

\

Q: Why are some people willing to pay thousands of yuan for iPhone, but not tens of yuan for legitimate iPhone software and games?

\

A: Because they can’t download the iPhone

\

\

57

\

Q: Is there an App with an amazing name?

\

A: Water meter assistant… It’s for deliveries…

\

\

58

\

Q: Why did you buy a portable hard drive?

\

A: Even if the conditions are good, you should make your women comfortable

\

\

59

\

Q: How to use iPad remote control PC shutdown?

\

A: Throw it at the PC power button

\

\

60

\

Q: How do you comment on apple’s launch event on September 7, 2016?

\

A: I watched three launches for the new MacBook Pro in half A year…

\

\

61

\

Q: How do you evaluate Internet Explorer?

\

A: Download other browsers browser —– A year later —– IE8 below suck, do front-end want to cry rhythm.

\

\

62

\

Q: What if my parents tell me to save money for a house, but I want to buy an Apple computer?

\

A: If you can really save 500,000 yuan for A house in 3 years, do you need 17,000 yuan to buy A computer, big brother?

\

\

63

\

Q: What kind of junk phone apps are there?

\

A: SMS blocker! Then it tells you it intercepted a text message. I’m sure 99% of you will click on it again to see the intercepted text messages!

\

\

64

\

Q: What is the biggest headache when you finish a complete PPT?

\

A: How to hide one’s strength from the leader.

\

\

65

\

Q: What can Vim do that Emacs can’t?

\

A: Help the poor children in Uganda…

\

\

66

\

Q: Why do Apple users choose Apple?

\

A: Because people who don’t use Apple aren’t Apple users.

\

\

67

\

Q: What are some of the classic myths in the computer world?

\

A: Windows is looking for A solution online.

\

\

68

\

Q: Will the wired mouse be replaced by the wireless mouse?

\

A: I don’t think the wired mouse will be replaced in Internet cafes

\

\

69

\

Q: What are some of the classic myths in the computer world?

\

A: I have read and agree to the terms

\

\

70

\

Q: What do computer science students say?

\

A: It’s running fine on my computer

\

\

71

\

Q: What do you think of Baidu’s official blog publicly refuting rumors about Robin Li’s family?

\

A:

“Chinese people are not that sensitive to privacy and are willing to trade privacy for convenience.”

– li

\

\

72

\

Q: How do you chat when you meet Jack Ma on the plane?

\

A: Hello Jack, my name is Jackson.

\

\

73

\

Q: How do you understand Jack Ma’s saying that after eight years, houses are like green Onions?

\

A:

Hurry to buy green onion, the price of green onion is going up!!

\

\

74

\

Q: What do you think of Jack Ma’s saying that “killing the landlord does not mean you can become rich”?

\

A: He means “Don’t kill me”

\

\

75

\

Q: What do you think of Baidu quietly dimming the color of the advertising prompts that promised rectification after the Wei Zexi incident?

\

A: Please don’t black Baidu, I am doing front-end development, this is A long time, web CSS faded

\

◆ ◆ ◆  ◆ ◆

Pay attention to the following public account data Science Club backstage reply “Zhihu” to obtain the complete code of this article

\

Python Chinese community as a decentralized global technology community, to become the world’s 200000 Python tribe as the vision, the spirit of Chinese developers currently covered each big mainstream media and collaboration platform, and ali, tencent, baidu, Microsoft, amazon and open China, CSDN industry well-known companies and established wide-ranging connection of the technical community, Have come from more than 10 countries and regions tens of thousands of registered members, members from the Ministry of Public Security, ministry of industry, tsinghua university, Beijing university, Beijing university of posts and telecommunications, the People’s Bank of China, the Chinese Academy of Sciences, cicc, huawei, BAT, represented by Google, Microsoft and other government departments, scientific research institutions, financial institutions, and well-known companies at home and abroad, nearly 200000 developers to focus on the platform.

Python Chinese community public account bottom reply “internal push”

Get a weekly list of technical positions to be promoted

********▼ Click below **** to read the original article and become a free **** community member