• How I used Python to find interesting people to follow on Medium
  • Radu Raicea
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: Park – ma
  • Proofreader: Mingxing47

Cover image: Old Medium Logo

Medium has tons of content, users, and countless posts. When you’re trying to find interesting users to follow, you’ll find yourself overwhelmed.

My definition of an interesting user is someone from your social network who stays active and regularly posts quality comments on Medium.

I checked the latest posts of my followers to see who was replying to them. I think if they respond to a user I follow, it means they might be kindred spirits.

The process was tedious, and it reminded me of the most valuable lesson I learned during my last internship:

Any tedious task can and should be automated.

I want my automation to be able to do the following:

  1. Get all the users from my followers
  2. Get the latest posts from each user
  3. Get all the comments on every post
  4. Filter responses from 30 days ago
  5. Filter out responses that are less than the minimum number of recommendations
  6. Gets the username of the author of each reply

Let’s get started

I first looked at Medium’s API and found it limited. It offers me too few features. Through it, I can only get information about my own account, not about other users.

On top of that, the last update to Medium’s API was more than a year ago, and there are no recent signs of development.

I realized THAT I could only rely on HTTP requests to get my data, so I started using my Chrome developer tools.

The first goal is to get my watch list.

I open my developer tools and go to the Network TAB. I filtered everything except XHR to see where Medium was getting my attention. I refreshed my profile page, but nothing interesting happened.

What if I click the follow button on my profile? Success!

I found a link to the user focus list.

In this link, I found a very large JSON response. It is a well-formed JSON, except for a string of characters at the beginning of the response:])}while(1);

I wrote a function to format and convert JSON to a Python dictionary.

import json

def clean_json_response(response):
    return json.loads(response.text.split('])}while(1); ') [1])
Copy the code

I’ve found an entry point, so let’s start coding.

Get all users from my focus list

In order to query the endpoints, I need my user ID (although I already know I’m doing this for educational purposes).

When I was looking for a way to get the user ID, I found that I could add? Format =json Gives Medium the URL to get the JSON response for the web page. I gave it a try on my personal page.

So let’s see, that’s my user ID.

])}while(1); </x>{"success":true."payload": {"user": {"userId":"d540942266d0"."name":"Radu Raicea"."username":"Radu_Raicea".Copy the code

I wrote a function to extract the user ID from the given user name. Also, I used the clean_json_response function to remove unwanted strings at the beginning of the response.

I’ve also defined a constant called MEDIUM to store the strings that all MEDIUM’s urls contain.

import requests

MEDIUM = 'https://medium.com'

def get_user_id(username):

    print('Retrieving user ID... ')

    url = MEDIUM + '/ @' + username + '? format=json'
    response = requests.get(url)
    response_dict = clean_json_response(response)
    return response_dict['payload'] ['user'] ['userId']
Copy the code

Using the user ID, I query /_/ API /users/

/following endpoint to get a list of user names from my concern list.

When I did this in developer tools, I noticed that the JSON response only had eight usernames. It’s strange!

When I clicked on “Show more people”, I found the missing username. The original Medium used a pagination to display the list of concerns.

Medium uses a pagination to display the list of concerns.

Paging works by specifying limit (each page element) and TO (the first element of the next page), and I had to find a way to get the ID of the next page.

At the end of the JSON response obtained from /_/ API /users/

/following, I see an interesting JSON key-value pair.

. "paging":{"path":"/_/api/users/d540942266d0/followers","next":{"limit":8,"to":"49260b62a26c"}}},"v":3,"b":"31039-15ed0e5 "}Copy the code

At this point, it’s easy to write a loop to get all the usernames from my follow list.

def get_list_of_followings(user_id):

    print('Retrieving users from Followings... ')
    
    next_id = False
    followings = []
    while True:

        if next_id:
            # If this is not the first page of the focus list
            url = MEDIUM + '/_/api/users/' + user_id
                  + '/following? limit=8&to=' + next_id
        else:
            # If this is the first page of the focus list
            url = MEDIUM + '/_/api/users/' + user_id + '/following'

        response = requests.get(url)
        response_dict = clean_json_response(response)
        payload = response_dict['payload']

        for user in payload['value']:
            followings.append(user['username'])

        try:
            # If we can't find the "to" key, we're at the end of the list,
            # and the exception will be thrown.
            next_id = payload['paging'] ['next'] ['to']
        except:
            break

    return followings
Copy the code

Get the latest posts from each user

After I get a list of users I follow, I want to get their latest posts. I can send this request [https://medium.com/@

/latest? Format =json](https://medium.com/@username/latest? Format =json) To implement this function.

So I write a function that takes a list of user names and returns a Python list containing the ids of all the user’s most recent posts entered.

def get_list_of_latest_posts_ids(usernames):

    print('Retrieving the latest posts... ')

    post_ids = []
    for username in usernames:
        url = MEDIUM + '/ @' + username + '/latest? format=json'
        response = requests.get(url)
        response_dict = clean_json_response(response)

        try:
            posts = response_dict['payload'] ['references'] ['Post']
        except:
            posts = []

        if posts:
            for key in posts.keys():
                post_ids.append(posts[key]['id'])

    return post_ids
Copy the code

Get all the comments on each post

With a list of posts, I extracted all the comments via https://medium.com/_/api/posts/ /responses.

This function takes a Python list of post ids and returns a Python list of comments.

def get_post_responses(posts):

    print('Retrieving the post responses... ')

    responses = []

    for post in posts:
        url = MEDIUM + '/_/api/posts/' + post + '/responses'
        response = requests.get(url)
        response_dict = clean_json_response(response)
        responses += response_dict['payload'] ['value']

    return responses
Copy the code

Sift through the comments

At first, I wanted the comments to reach the minimum number of likes. But I realized that this might not be a good indicator of how much the community appreciates a comment, since a user can like the same comment multiple times.

Instead, I use the recommended number for filtering. The number of recommendations is about the same as the number of likes, but it can’t be recommended multiple times.

I want this minimum to be dynamically adjustable. So I passed in a variable called Shame_min.

The parameters of the following function are for each comment and the shame_min variable. It is used to check whether the recommended number of comments has reached the minimum.

def check_if_high_recommends(response, recommend_min):
    if response['virtuals'] ['recommends'] >= recommend_min:
        return True
Copy the code

I would also like to get the latest comments. So I use this function to filter out comments that are more than 30 days old.

from datetime import datetime, timedelta

def check_if_recent(response):
    limit_date = datetime.now() - timedelta(days=30)
    creation_epoch_time = response['createdAt'] / 1000
    creation_date = datetime.fromtimestamp(creation_epoch_time)

    if creation_date >= limit_date:
        return True
Copy the code

Gets the username of the comment author

After finishing the filtering of the comments, I use the following function to grab the user ids of all authors.

def get_user_ids_from_responses(responses, recommend_min):

    print('Retrieving user IDs from the responses... ')

    user_ids = []

    for response in responses:
        recent = check_if_recent(response)
        high = check_if_high_recommends(response, recommend_min)

        if recent and high:
            user_ids.append(response['creatorId'])

    return user_ids
Copy the code

When you try to access a user’s profile, you will find that the user ID is useless. At this point I wrote a function to get the user name by querying the endpoint /_/ API /users/

.

def get_usernames(user_ids):

    print('Retrieving usernames of interesting users... ')

    usernames = []

    for user_id in user_ids:
        url = MEDIUM + '/_/api/users/' + user_id
        response = requests.get(url)
        response_dict = clean_json_response(response)
        payload = response_dict['payload']

        usernames.append(payload['value'] ['username'])

    return usernames
Copy the code

Let’s combine all the functions

After all the functions are done, I create a pipeline to get my list of recommended users.

def get_interesting_users(username, recommend_min):

    print('Looking for interesting users for %s... ' % username)

    user_id = get_user_id(username)

    usernames = get_list_of_followings(user_id)

    posts = get_list_of_latest_posts_ids(usernames)

    responses = get_post_responses(posts)

    users = get_user_ids_from_responses(responses, recommend_min)

    return get_usernames(users)
Copy the code

The script is finally complete! To test the program, you must call the pipe.

interesting_users = get_interesting_users('Radu_Raicea'.10)
print(interesting_users)
Copy the code

Know Your Meme

Finally, I added an option to store the result and timestamp in a CSV file.

import csv

def list_to_csv(interesting_users_list):
    with open('recommended_users.csv'.'a') as file:
        writer = csv.writer(file)

        now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        interesting_users_list.insert(0, now)
        
        writer.writerow(interesting_users_list)
        
interesting_users = get_interesting_users('Radu_Raicea'.10)
list_to_csv(interesting_users)

Copy the code

The source files for this project can be found here.

If you don’t already know Python, read the Python tutorial for TK: Learning Python: From Zero to Hero.

If you have suggestions for other standards that users are interested in, please leave a comment below!

Sum up…

  • I wrote a Python script for Medium.
  • This script returns a list of users who are active and have made interesting comments under the most recent posts of the users you follow.
  • You can pull the user from the list and run the script using his username instead of yours.

Click on my primer on open source licenses and how to add them to your project!

For more updates, follow me on Twitter.

The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.