Lyft micro service practice | 2. R&d performance optimization of local development quickly

This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.

How can we improve r&d efficiency? Do you rely on separate local development test environments, or do you rely on full end-to-end testing? This series of articles describes the history and evolution of Lyft’s development environment and helps us think about how to build an efficient development environment for large-scale microservices. This is the second article in a four-part series. 译 : Scaling productivity on Microservices at Lyft (Part 2): Optimizing for Fast Local Development^[1]

This is the second article in a series on how Lyft is effectively expanding its development practices as it faces a growing number of developers and services.

Part I: History of development and test environments
Part II: Optimizing rapid Local Development (this article)
Part three: Extending the service grid in a pre-delivery environment using coverage mechanisms
Part FOUR: Deployment control based on automated acceptance tests

This article will focus on how we can bring a great development experience to the laptop for super fast iteration.

Lack of an internal development cycle

Developers may split the process into an internal development cycle and an external development cycle as they change code. An internal development cycle is a rapidly iterative cycle where code changes are made and tested to see if they work. Ideally, developers will run the internal development loop multiple times to get the features they want to develop to work, and most of that time should be spent editing code and then running tests quickly in 10 seconds. The external development cycle typically involves synchronizing code changes to a remote Git branch, running tests on a CI, and checking the code before deploying the changes. An external development loop usually takes at least 10 minutes, and ideally only a small amount of time to process code reviews and comments.

As we mentioned in the previous article, executing an internal development loop requires synchronizing code changes to a remote virtual machine environment called Onebox that the developers themselves run. These environments are notoriously volatile, take a long time to start up, and often need to be rebuilt, and users are often frustrated that the internal development cycle is often blocked by these environmental issues. Tweaking the environment and synchronizing code changes makes this process look more like an external development loop, where developers typically return to the real external development loop and run tests with CI for each iteration.

Run one service at a time

So we started building a simple and quick internal development loop. The core shift that needs to be made is from a fully integrated environment with Onebox(many services) to an isolated environment where only one service and its tests run. This new isolation environment will run on the developer’s laptop, which goes back to the internal development loop shown above, where users simply edit code and run tests with no additional steps in between. We try to make most of our tests independent of a single service. We also created the ability to launch a single service on a laptop and send it a test request.

We decided to run the service code directly on MacOS without using containers or VMS. From previous experience, we know that running code in a container is not a free abstraction, and while setting up the execution environment becomes easier, it can lead to user confusion and additional debugging challenges when there are problems with the container network or file system installation. Native running also gets better IDE support than running in a container. We still use containers in some cases, such as running data stores or services that only run on Linux.

Set up the laptop environment

The biggest cost of running code natively on MacOS is having to configure and maintain the environment on every developer’s laptop. To overcome this, we’ve invested in tools to get Lyft’s new developers up and running quickly.

Back-end services are written in Python and Go (with a few exceptions), and front-end services are written in Node, each with its own Github libraries and dependency sets.

Python

For Python, we build a virtual environment (also known as VENV) for each service. We developed a tool to help create and manage VENv. The tool distributes specific supported versions of Python and does some operating system setup, such as installing shared libraries through Homebrew and setting up SSL to use the correct certificates, as well as enforcing the use of internal PyPi repositories.

When the user runs the command to build Venv, it will:

Look at the metadata to select the correct Python version for the service
Create a new venv
Dependencies defined in requirements.txt are installed via PIP

Once VENv is built, it must be activated (added to $PATH). We use AActivator [2] to activate VENv automatically every time the user enters the service directory and deactivate it when they leave. The venv created is immutable (PIP Install is disabled). When changes are made to requirements.txt, a new venv is built. This ensures that the dependencies in Venv match exactly with the requirements.txt file and what will be deployed. Previously fully built Venvs are cached, so if the user reverts changes to requirements.txt, it will use the previously built version. It also supports the creation of mutable VENVs to make it easy to try out new dependencies without triggering a complete rebuild. We also support creating Venv for internal Python libraries to make it easy to run tests locally.

Go

For Go, the setup is simple. When a user installs the Go runtime, he sets some environment variables (such as the agent to download dependencies) and can then use Go Run or Go Test. Thanks to the great Go Modules [3] toolchain, all dependencies are automatically downloaded and linked each time you run these commands.

Node

For Node, we use a custom wrapper around NodeEnv [4] to download and install the correct Node and NPM for the service based on the metadata, preventing users from having to manually install a version manager like NVM [5] and switch to the correct Node version when running different services.

With all but a few services, developers can use the environment described above to run the service code directly on a laptop. Some services rely on libraries that are only supported on Linux, and for this very small subset, developers can easily download Docker images built by CI systems, mount native code directories, and run tests for the services. This process is more cumbersome, but still allows for rapid iteration.

Run the service

It’s important for developers to be able to iterate quickly over a fully functioning service, so we worked hard to enable use cases on laptops. We created tools to coordinate starting the service, sending test requests, and brokering any requests made by the service. To ensure data isolation, the data store used by the service is started locally each time with new data. Run scripts at startup time to create tables and insert any data needed for the tests. The team is responsible for maintaining this test data set to allow for proper testing of its features.

Here is an example of the steps required to run the service:

Run environment checks (for example, tools installed correctly, check out necessary Git repositories, make sure ports are free)
Activating the virtual environment (Python and Node services)
Start datastores (e.g. Dynamodb, ElasticSearch, Postgres)
Start the proxy application (more on that later)
Run the data store population script
Run the service

Getting developers to run all of this manually was tedious and error-prone, so we needed tools to orchestrate these checks and manage the necessary flow using declarative configuration, and we decided to use Tilt[6] for this. While Tilt is often used to test code in the Kubernetes cluster, we currently use it for purely local workflow management. Each service has a Tiltfile[7] specifying the steps that must be run before starting the service. Tiltfile is written in Starlark, a Python dialect used by Bazel that provides great flexibility for service owners. We provide common functions (such as ensure_venv(), launch_dynamodb())), so the service Tiltfiles consists mainly of these predefined function calls.

To start the service, the user performs tilt Up on the terminal, and Tilt parses the Tiltfile and creates an internal execution plan that runs all checks and processing in the order specified in the Tiltfile. Tilt has a local web application that shows the status of everything that is running. Users can click a TAB in the Web application to display log output for each process. This allows the user to track the status of running processes and debug any errors using the log.

Once the service is running, it will automatically reload when the user edits code in the IDE. This is a big advantage of shortening the internal loop, because the user doesn’t even need to trigger any action to reload the service.

Process requests to other services

Lyft consists of a large network of services, and almost any service calls at least one other service in that network. There are two main ways to handle requests from local services:

Construct and return a simulated response
Forward the request to another real environment

We support both in a very flexible way, using in-house tools we developed ourselves.

A few years ago, Lyft developed an agent app as a tool to help mobile app developers separate their development workflows from their back-end service teams. It is a Electron application that acts as a proxy between calls by mobile applications to the prefab environment API. The developer connects the mobile application to a proxy server, providing each user with a unique URL. By default, the agent forwards all requests to the pre-sent environment. The user can choose to override a particular call and return fully simulated data, or to change a field in the response of the pre-sent environment. This allows mobile developers to test application changes while the back-end API is still under development. The setup looks like this:

The biggest advantage this proxy application brings over other tools such as charlesproxy[8] is the deep integration with Lyft’s interface definition language (IDL[9]), which is implemented through protocol buffers[10]. In IDL, we specify the request and response structures for the back-end service endpoints. In proxy applications, users write responses through a Typescript code Editor (using the Monaco Editor[11] in VSCode) that integrates with IDL and provides an ide-like experience for users, enabling type checking and automating the structure of simulated data responses. The code interface also allows for complex interactions, such as setting fields from a request to a response. It also displays a readable request and response body for all requests through the broker, enabling users to visualize all requests from mobile applications.

When we developed tools to run back-end services locally, proxy applications were ideally suited to handle requests made by local services to other services. We reused proxy capabilities to forward requests to the pre-delivery environment or to return mock data based on the user’s intent. The setup process looks like this:

Make a request to the local service

Next, we need a tool that lets users write and send requests directly to services running locally. This wasn’t common in the Devbox/Onebox era, where most test requests came from mobile clients, so we had to come up with new solutions. There are many tools for constructing API requests, such as Curl [12] or Postman[13]. However, we need to support several RPC transport formats at Lyft, including GRPC, JSON(http-based), and Protobuf (HTTP-based). There are no existing tools that can seamlessly handle these different formats, leverage our definitions, and easily combine requests.

Proxies were the best choice for us, we added the ability to help users write requests using the Typescript code editor and push buttons to send requests to local services, again leveraging the integration with IDL to provide URL paths and auto-completion of request body fields.

The results of

Since we launched the tool across the company earlier this year, feedback has been overwhelmingly positive. Developers like to run tests on laptops and ides without requiring any remote environment. It usually takes about an hour to create a new Onebox environment, but now the laptop environment is always ready to run tests, and launching the service locally using Tilt takes just a few minutes.

As developers spend more time testing their services, we also see a shift in their behavior. When running a service locally, users send requests directly to the service API, rather than testing through mobile applications and public apis. This increases the developer’s familiarity with the service API and reduces the scope for debugging when things go wrong.

While cost was not the primary driver of the project, the savings ended up being significant because there was no need to configure a powerful AWS instance for each developer to support Onebox. Letting users run the service independently on their own laptops means really reducing the total computing resources required.

Future jobs

What is described above is the first iteration of the tool. We have a lot of exciting ideas about what to do next. Here are some of them:

Support for Apple chips

We are excited to begin delivering our new Macbook Pro with M1 chips to developers soon. Early benchmarking shows that these machines provide a huge performance boost right out of the box, even in an emulated environment! The extra work to make sure we had everything running natively locally would help us reap the performance benefits.

Route requests from the service in the pre-delivery environment to the local service

Currently, the user must call the locally running service directly, not the client API, and let the request pass through the pre-sent service before routing to the local service. We plan to enable this feature soon, which will allow users to test a full end-to-end user stream (such as running a mobile app) to test new features of the back-end service, while still having a fast internal development loop for local development.

Improved send request UI

The code interface for writing requests in proxy applications is flexible and powerful, but correctly constructing requests can still be challenging for novice users. We wanted to create a custom UI for this use case that was more like Postman[13], while keeping the code interface powerful. We also plan to create an API platform where users can easily discover and experience any Lyft service.

Remote development environment

There have been some exciting developments in fully remote development environments like Github Codespaces. As these solutions mature, we will certainly pay close attention to see if they fit into our use cases.

The next article in this series will show you how to safely deploy the code in PR to a pre-release environment and test it.

References: [1] Scaling productivity on microservices at Lyft (Part 2): Optimizing for fast local development: Eng.lyft.com/scaling-pro… [2] aactivator: github.com/Yelp/aactiv… [3] Using Go Modules: go.dev/blog/ Using -… [4] Node.js Virtual Environment: github.com/ekalinin/no… [5] Node Version Manager: github.com/nvm-sh/nvm [6] Tilt: tilt.dev/ [7] Writing Your First Tiltfile: Docs. Tilt. Dev/tiltfile_au… [8] Charles Web was Debugging Proxy: www.charlesproxy.com/ [9] the IDL: en.wikipedia.org/wiki/IDL_ (p… [10] Protocol Buffers: developers.google.com/protocol-bu… [11] Monaco Editor: Microsoft. Making. IO/Monaco – edit… [12] curl: curl.se/ [13] Postman: www.postman.com/

Hello, MY name is Yu Fan. I used to do R&D in Motorola, and now I am working in Mavenir for technical work. I have always been interested in communication, network, back-end architecture, cloud native, DevOps, CICD, block chain, AI and other technologies.

The official wechat account is DeepNoMind