If you have a Python program running, you might want to know how the code actually performs. This program may be in production or just on your local machine. You’ll want to know where the running program is spending its time, and if there are any “hot spots” that should be investigated further for improvement. You may be dealing with a misbehaving production system, and you may want to profile it in an unobvious way without further impacting production performance or requiring code changes. What’s a good way to do that? This article will discuss py-Spy, a tool that allows you to profile already running Python programs.

Deterministic and sampling profilers

In previous articles, I wrote about two deterministic profilers, cProfile and Line_profiler. These profilers are useful when you develop code and want to analyze parts of the code or the entire process. Since they are deterministic, they will tell you exactly how many times a function (or in line_profiler, a line) is executed, and how much time it takes relative to your other code to execute. Because these profilers run during observation, they are somewhat slowed down by the fact that they must do bookkeeping and calculations as the program executes. Modifying the code or restarting the code with the profiler enabled is often not an option for production code.

This is where a sampling profiler can help. The sampling profiler looks at an existing process and uses various techniques to determine what the running process is doing. You can try it yourself manually. For example, on Linux, you can use the pstack (or gstack ) command to see what your process is doing. On the Mac, you can perform the echo “thread backtrace all” | LLDB -p < pid > to see similar things. The output will be the stack of all threads in your process. This applies to any process, not just Python programs. For your Python program, you will see the underlying C functions, not your Python functions. In some cases, checking the stack a few times in this way may tell you if your process is stuck, or where it is slow, as long as you can translate your own code. But doing so only provides a single sample of time. Since the process is continuously executing, your sample may change every time you run a command (unless it is blocked or you just happen to be very lucky).

The sampling profiler and the tools around it take snapshots of the system multiple times over a period of time and then give you the ability to look at that data and see where your code is slow.

py-spy

Py-spy uses system calls (process_VM_readv on Linux,vm_read on OSX,ReadProcessMemory on Windows) to get the call stack, This information is then converted into the Python function calls you see in the source code. It samples multiple times per second, so it has a good chance of seeing the various states your program will be in over time. For speed, it is written in Rust.

Incorporating py-Spy into your project is simple and can be installed via PIP. To show you how to use it, I’ve created some sample code to parse and observe how py-Spy tells us about a running Python process. You can easily duplicate these steps if you want to follow suit.

First, I set up a new virtual environment using py-env and the pyenv-VirtualEnv plug-in for this project. You can do this, or you can set up a virtual environment with your favorite tool.

# tails Python version you prefer Pyenv install 3.8.7 # make our virtualenv (with above version) pyenv virtualenv 3.8.7 py-spy # activate it Pyenv activate py-spy # install py-spy PIP install py-spy # make sure we pick up the commands  in our path pyenv rehashCopy the code

That’s all. We now have the tools we can use. If you run py-Spy, you can see the common usage.

$py-spy py-spy 0.3.4 Sampling profiler for Python programs USAGE: py-spy <SUBCOMMAND> OPTIONS: -h, --help Prints help information -V, --version Prints version information SUBCOMMANDS: record Records stack trace information to a flamegraph, speedscope or raw file top Displays a top like view of functions consuming CPU dump Dumps stack traces for a target program to stdout help Prints this message or the help of the given subcommand(s)Copy the code

A case in point

To demonstrate Py-Spy, I wrote a simple long-running process that would consume streaming prices from cryptocurrency exchanges and generate a record per minute (also known as a bar chart). The bar chart contains information about the past minute. The bar chart includes the highest, lowest, and final prices, volume, and volume weighted average price (VWAP). For now, the code only records these values, but can be extended to update a database. While it’s simple, it’s a useful example because cryptocurrencies are traded around the clock and will give us real-world data to work with.

I am using the Coinbase Pro API for Python to access the data from the WebSocket feed. Here is a rough draft with some debugging code (and two ways to generate VWAP, one less efficient (the _vWAp method) and one more efficient). Let’s see if py-Spy reveals how much time this code takes.

This code will eventually generate a thread for the WebSocket client. The asyncio loop sets a timer for the next minute boundary to tell the client to record the bar data. It will run until you kill it (for example, with Ctrl-C).

#! /usr/bin/env python import argparse import functools import datetime import asyncio import logging import arrow import cbpro class BarClient(cbpro.WebsocketClient): def __init__(self, **kwargs): Super ().__init__(**kwargs) self._bar_volume = 0 self._weighted_price = 0.0 self._ticks = 0 self._bar_high = None self._bar_low = None self.last_msg = {} self._pxs = [] self._volumes = [] def next_minute_delay(self): delay = (arrow.now().shift(minutes=1).floor('minutes') - arrow.now()) return (delay.seconds + delay.microseconds/1e6) def _vwap(self): if len(self._pxs): wp = sum([x*y for x,y in zip(self._pxs, self._volumes)]) v = sum(self._volumes) return wp/v def on_message(self, msg): if 'last_size' in msg and 'price' in msg: last_size = float(msg['last_size']) price = float(msg['price']) self._bar_volume += last_size self._weighted_price += last_size * price self._ticks += 1 if self._bar_high is None or price > self._bar_high: self._bar_high = price if self._bar_low is None or price < self._bar_low: self._bar_low = price self._pxs.append(price) self._volumes.append(last_size) logging.debug("VWAP: %s", self._vwap()) self.last_msg = msg logging.debug("Message: %s", msg) def on_bar(self, loop): if self.last_msg is not None: if self._bar_volume == 0: self.last_msg['vwap'] = None else: self.last_msg['vwap'] = self._weighted_price/self._bar_volume self.last_msg['bar_bar_volume'] = self._bar_volume self.last_msg['bar_ticks'] = self._ticks self.last_msg['bar_high'] = self._bar_high self.last_msg['bar_low'] = self._bar_low last = self.last_msg.get('price') if last: last = float(last) self._bar_high = last self._bar_low = last logging.info("Bar: %s", Self. last_msg) self._bar_volume = 0 self._weighted_price = 0.0 self._ticks = 0 self._pxs.clear() self._volumes. // reschedule loop.call_at(loop.time() + self.next_minute_delay(), functools.partial(self.on_bar, loop)) def main(): argparser = argparse.ArgumentParser() argparser.add_argument("--product", default="BTC-USD", help="coinbase product") argparser.add_argument('-d', "--debug", action='store_true', help="debug logging") args = argparser.parse_args() cfg = {"format": "%(asctime)s - %(levelname)s - %(message)s"} if args.debug: cfg["level"] = logging.DEBUG else: cfg["level"] = logging.INFO logging.basicConfig(**cfg) client = BarClient(url="wss://ws-feed.pro.coinbase.com", products=args.product, channels=["ticker"]) loop = asyncio.get_event_loop() loop.call_at(loop.time() + client.next_minute_delay(), functools.partial(client.on_bar, loop)) loop.call_soon(client.start) try: loop.run_forever() finally: loop.close() if __name__ == '__main__': main()Copy the code

Run this example

To run this code, you need to install some additional modules. The CBPro module is a simple Python wrapper around the Coinbase APIs. Arrow is a good library for doing date and time logic.

pip install arrow cbpro
Copy the code

Now, you can run your code with a debug log and expect to see some information, depending on how busy the exchange is.

./coinbase_client.py -d 2021-03-14 17:20:12, 828-debug-using selector: KqueueSelector -- Subscribed! -- 2021-03-14 17:20:13, 059-debug - Message: {'type': 'subscriptions', 'channels': [{'name': 'ticker', 'product_ids': [' bTC-USD ']}]} 2021-03-14 17:20:13, 060-debug-vWAp: 60132.57Copy the code

Dissect the example

Now, let’s review the commands for py-spy. First, using the dump command will give us a quick stack view, translated into Python functions.

Here’s a quick caveat: If you’re on a Mac, you’ll need to run py-Spy as Sudo. On Linux, it depends on your security Settings. Also, since I’m using PyEnv, I need to use the -e flag to pass my environment to Sudo so that it can find the correct Python interpreter and py-Spy script in the path. I used the ps command in my shell to get the ID of my process (in my case, 97520).

py-spy dump

sudo -E py-spy dump -p 97520 Process 97520: / Users/MCW /. Pyenv/versions/py - spy/bin/python/coinbase_client. Py - d python v3.8.7 (/ Users/MCW. Pyenv/versions / 3.8.7 / bin/python3.8) Thread 0 x113206dc0 (idle) : "MainThread" select (selectors.py:558) _run_once (asyncio/base_events.py:1823) run_forever (asyncio/base_events.py:570) main (coinbase_client.py:107) <module> (coinbase_client.py:113) Thread 0x700009CAA000 (idle): "Thread-1" read (ssl.py:1101) recv (ssl.py:1226) recv (websocket/_socket.py:80) _recv (websocket/_core.py:427) recv_strict (websocket/_abnf.py:371) recv_header (websocket/_abnf.py:286) recv_frame (websocket/_abnf.py:336) recv_frame  (websocket/_core.py:357) recv_data_frame (websocket/_core.py:323) recv_data (websocket/_core.py:310) recv (websocket/_core.py:293) _listen (cbpro/websocket_client.py:84) _go (cbpro/websocket_client.py:41) run (threading.py:870) _bootstrap_inner (threading.py:932) _bootstrap (threading.py:890)Copy the code

You can see that there are two threads running. One is reading data, and the other is running select in the loop. This is only useful for profiling when our program gets stuck. One nice feature though is that if you give it the –locals option, it will show any local variables, which is very helpful for debugging!

py-spy top

The next command to try is top.

sudo -E py-spy top -p 97520
Copy the code

This leads to an interface very similar to the Unix top command. As your program runs and py-Spy collects samples, it will show you how long it takes. Here’s my screenshot after 30 seconds.

py-spy top output

Py – spy records

Finally, you can use py-Spy to record data for later analysis or output. There is a raw format, speedScope format, and a Flamegraph output. You can specify how long you want to collect data (in seconds), or just let it collect data until you exit the program with Ctrl-C. For example, this command will generate a useful little SVG file, Flamegraph, that you can interact with in a Web browser.

sudo -E py-spy record -p 97520 --output py-spy.svg
Copy the code

You can also export the data in SpeedScope format and then upload it to the SpeedScope Web tool for further analysis. This is a great tool to interactively see how your code performs.

I encourage you to run this code yourself and play with the SpeedScope tool and SVG output, but here are two screenshots that help explain how it works. The first view is the overall SVG output. If you hover your mouse over the cell, the functional details are displayed. As you can see, most of the time is spent in the WebSocket client _listen method. But the on_message method is shown to the right of it (specified by arrow)

Py – spy SVG output

If we click on it, we get a detailed view.

Py-spy SVG verbose output

In my case, I see that the _vWAp method that is not needed in my list understanding and recording shows up quite high in the configuration file. I can easily remove this method (and the additional prices and quantities I track) because VWAP can be calculated with just one running product and total quantity (as I’ve already done in the code). It is also interesting to see how long it takes to record when the script is running in debug mode.

conclusion

With that said, I encourage you to try using Py-Spy in some of your code. If you were trying to predict where your code would spend its time, how correct would you be? Did you find anything that surprised you? It might be possible to compare the output of py-Spy to a deterministic profiler like Line_profiler.

I hope this overview of Py-Spy has been helpful and that you can deploy the tool when diagnosing performance problems in your own code.

The postProfiling Python code with py-spyappeared first onwrighters.io.