This article has been authorized by the author Zhang Gengyuan netease cloud community.

Welcome to visit netease Cloud Community to learn more about Netease’s technical product operation experience.


Since the company’s e-trust public service account has the function of inquiring today’s menu, I gradually formed the habit of checking the menu in each window before going to dinner, and then deciding where to eat.

But the more I use this feature, the more I find it inconvenient. At present, the steps to query the menu on the official account are as follows:

  1. Open the easy letter

  2. Open the netease Genie official account

  3. Click On Convenient Services

  4. Click on today’s menu

  5. Wait for the entry link to return to today’s menu

  6. Click the entry link to view today’s menu

The process is passive and a bit complicated as a fixed action that needs to be performed at least twice a day. In particular, the fifth step, the sixth step need network access, if the mobile phone network access is not stable (WiFi, 4G signal is not good, etc., when sitting, waiting for the elevator is easy to encounter this situation), any step will be stuck and can not be queried; There are also some students for a variety of reasons did not pay attention to netease genie public number, unable to check today’s menu.

So I thought, is there an easier way to directly push the daily menu content to the mobile phone actively, and you can view the menu with a simple click?

  1. Click push message

  2. View today’s Menu

I had the idea and I did it.

This can be broken down into three steps:

  1. Data capture

  2. The data processing

  3. Data push

More on that below.

Data capture

To crawl recipe data, you first need to know where the information is being queried. I have not done the development of wechat official accounts, but according to general experience, whether wechat or wechat official accounts published articles are generally a simple HTTP page. To find the source of the daily menu data, find the pattern of these HTTP page addresses.

There are many ways to grab a mobile network request. The most convenient way is to run a tool like tcpdump in the background of the mobile phone, and access the Today menu of easy to catch the desired results. However, as my phone is iOS without jailbreak, it is impossible to do so due to sandbox mechanism.

The last method is to run a MITmProxy1 service on the computer of the same network, and specify the HTTP Proxy address of the mobile phone as the address of the computer. Open the link of today’s menu in Yixin, and you can see a string of HTTP access records of the mobile phone in MitmProxy. This contains the HTTP URL that we want to grab today’s menu.

It can be found that the HTTP URL link of today’s menu is the following pattern:

http://numenplus.yixin.im/singleNewsWap.do?companyId=1&materialId=${id}Copy the code

There is only one variable, ${id}, which is a positive integer and should be the id of the article. The contents of the daily menu we want to climb are all in these links, as well as some other advertising articles published by netease Genie public account. This page can be climbed by simple HTTP GET without additional processing.

I have studied the rule of ID generation of articles that did not find the menu of today, and speculated that the ID should be generated in the back end of E-trust, and the mobile client can not directly get this ID. So simply on each ID climb again, check the content is today’s menu on the article processing, is not ignored. That’s the data source for today’s menu.

I am familiar with Python, so I use Python to implement:

import requestsdef http_get(url, timeout=3):
    try:
        res = requests.get(url, timeout)    except:
        LOG.exception("Failed to GET: %s" % url)    else:        ifres.status_code ! = 200:return None
        else:            return resdef fetch(start, step=300):
    last_id = start    for i in iter(range(start, start + step)):
        url = ("http://numenplus.yixin.im/singleNewsWap.do?"
               "companyId=1&materialId=%d" % i)
        response = http_get(url)        if not response:            continue

        # handle menu data hereCopy the code

The data processing

Data processing, there are two main tasks:

  1. As mentioned above, you need to check whether the content of the article you are crawling is today’s menu

  2. Parsing the HTML content to get the menu information we want

The first problem is relatively simple and can be checked directly with simple keyword regular expression matching. For example, if the article contains the words “Today’s menu”, we assume that the article is today’s menu.

The second problem is a little more complicated. We need to extract the text data from the HTML source data that we crawl, and then generate the date, breakfast, lunch, and dinner information for this menu. This can also be done with slightly more complex regular expressions.

BeautifulSoup, a well-known third-party library in Python that handles HTML-formatted content, is very easy to use:

Gets the menu content

def _parse(self, content):
    try:
        bs = BS(content, "html.parser")        if bs.find_all(class_="m-error") :return None
        else:            return bs    except:
        LOG.exception("Failed to Parse content: %s" % content)def _handle_menu(bs):
    try:
        content = bs.find(id="divCNT")    except:
        LOG.warn("Failed to get content")        return None
    else:        return contentCopy the code

This is what it looks like before HTML parsing

This is what it looks like when you parse it, you’ve taken all the tags out of the HTML

Check if it’s today’s menu

def _is_menu(text):
    # u4eca\ 499 e5\u83dc\ U5355 => U5355 => U5355
    if re.findall(ur"\u4eca\u65e5\u83dc\u5355", text, re.UNICODE):        return True
    else:        return FalseCopy the code

Extract menu date

def _handle_date(content):
    # \u6708 => month \ x499 e5 => days
    res = re.findall(ur"(\d+)\u6708(\d+)\u65e5", content.text, re.UNICODE)    if not res:
        LOG.warn("Failed to parse date")        return None
    else:
        month, day = tuple([int(i) for i in res[0]])
        year = datetime.datetime.now().year        return datetime.datetime(year, month, day)Copy the code

Extract menu contents for breakfast, lunch and dinner

def _menu_to_text(content):
    # \ x499 \u9910 => Breakfast
    # u4e2d\u9910 => Chinese food
    # \u665a\u9910 => Dinner
    # \u591c\ U5bb5 => Midnight snack

    text = content.get_text()
    res = re.findall(ur"\u65e9\u9910([\s\S]+)\u4e2d\u9910([\s\S]+)"
                     ur"\u665a\u9910([\s\S]+)\u591c\u5bb5",
                     text, re.UNICODE | re.MULTILINE)    if not res:
        LOG.warn("Failed to match menu")        return None
    else:
        menu = {}
        menu[BREAKFAST] = res[0][0]
        menu[LUNCH] = res[0][1]
        menu[SUPPER] = res[0][2]        return menuCopy the code

Data push

Now we have solved the data crawling and processing of today’s menu, just how to push the menu content to the mobile phone.

According to the survey, some useful third-party push services on iOS platforms include Pushover, Pushbullet, Boxcar, Amazon SNS, etc.

Amazon SNS does not provide a ready-made client first rejected; Pushover looks like the best option, but a one-time license fee of $5 per mobile client is rejected. All things considered, Pushbullet is complete, free, well-documented, and supported across all platforms.

To do this, write an HTTP POST request according to the API documentation provided by Pushbullet:

def send_notification(subject, content, channel=PUSHBULLET_CHANNEL):
    try:
        res = requests.post(            "%s/pushes" % PUSHBULLET_API,
            headers={"Access-Token": PUSHBULLET_TOKEN},
            data={"title": subject,                  "body": content,                  "type": "note"."channel_tag": channel},
            timeout=30)    except:
        LOG.exception("Failed to send notification")    else:        ifres.status_code ! = 200: LOG.warn("Error when pushing notification")Copy the code

Here’s what the menu looks like:

PC/Mac also supports:

Put the above code snippets together to create a small project that can grab and push today’s menu. The final runnable code is here (it also includes the ability to email the contents of the menu) :

https://g.hz.netease.com/hzzhanggy/what2eat2day_ntesCopy the code

automation

The whole process of data crawling and push has been written, and the last thing we need to do is to automate the whole process. We just need to look at the mobile phone push messages every meal.

In fact, it is the data crawling, push to make it a scheduled task. I use systemd Timer to implement:

Run the wrapper run.sh for the script in Virtualenv

#! /bin/bashBASE=/home/stanzgy/workspace/what2eat2day_ntes$BASE/.venv/bin/python $BASE/fetch.py $@Copy the code

Today menu fetch service file menu_fetch

[Unit]Description=Fetch NetEase menu today[Service]Type=oneshotExecStart=/home/stanzgy/workspace/what2eat2day_ntes/run.sh f[Install]WantedBy=multi-user.targetCopy the code

Today menu grabs the timer file menu_fetch. Timer

[Unit]Description=Fetch NetEase menu everyday[Timer]OnCalendar=Mon-Fri *-*-* 10:00:00Unit=menu_fetch.service[Install]WantedBy=multi-user.targetCopy the code

The timer configuration for pushing the today menu is similar to the above, except that the parameters passed in from the command line are different, which is omitted here. The final effect is as follows

Free experience cloud security (EASY Shield) content security, verification code and other services

For more information about netease’s technology, products and operating experience, please click here.

Related articles: [Recommended] The service introduction of netease cloud Verification code V1.0