Editor’s note: As the current number one package ecosystem, NPM contains millions of packages, but behind the ecosystem boom is the increasing speed of NPM installation. This post by Ant Group NPM engineer Zero BRANCH describes how to Achieve a Second level NPM Installation. Welcome to this post.

I am at Zero Branch, from the Swift Team at alipay Experience Technology Department. I am a veteran Node.js developer and Rust fan. Chair/ egg.js core developer, author of Tegg. Currently, I am responsible for TNPM (ali/Ant internal NPM client).

NPM is currently the number one package ecosystem, containing millions of packages, but behind the ecosystem boom is the increasing pace of NPM installation. I will share with you how to implement a second NPM installation method.

This session will be divided into three sections.

  • Performance Comparison: Use a benchmark to visually illustrate our installation performance.
  • Optimization: will lead you through our optimization journey, expect to bring you some inspiration.
  • Development path: Describe where we are in our implementation so far and what the final state of development will be.

The performance comparison

Benchmark uses the internal NPM package @Alipay/Smallfish. This package contains 2k dependencies, and 38 MB of data was transferred over the network during the installation. While 38 MB may seem small, unzipping takes up more than 400 MB of space.

The M1 Mac Mini is 8C16G and has a gigabit wired network. The test environment has no NPM cache (including metadata and tarps). The accessed Registry is an internal private source, which guarantees response latency.

Take a look at the data for NPM, which took 1 minute 07 seconds to install. Yarn has been optimized to increase the speed to 36 seconds. PNPM and CNPM adopted a similar soft chain optimization approach, with the time reduced to 19 seconds. NPM Rapid was the fastest, completing the installation in just six seconds. For NPM, performance is improved by nearly 10 times.

Optimize the way

Before we can start tuning, we need to see where the performance bottlenecks of NPM are, which fall into three main sections.

  • HTTP requests: Installing a package requires two HTTP requests, one to Go to Registry to get all versions of the package and the other to download the TGZ file for the specific version of the package.
  • IO operation: TGZ files are downloaded through ungzip and untar. Untar generates a large number of small files. In addition, NPM installation will also be for bin file newline repair, chmod, soft chain and other operations.
  • Disk footprint: Once smallfish is installed, it will take up 445 MB of space, and each project installation will take up that much more space.

In this share, three core tools for implementing optimization will be introduced.

  • Reduce HTTP requests through aggregation.
  • Add an intermediate layer to reduce file writing and greatly improve installation performance.
  • Through the way of caching, to reduce disk occupancy, solve the PROBLEM of NPM black hole.

The aggregation

Npmgraph.an was used to generate smallfish dependency graphs, which looked like a nebula of complexity and had a staggering number of dependencies, so the process of generating dependency trees was slow.

At the same time, we observe the generation process of the dependency tree. Indirect dependent requests also rely on direct dependencies, resulting in a recursive process in the request process. This further limits the time for dependency tree generation.

We generate the dependency tree by sending the project’s package.json to the server and running @npmcli/arborist on the server. Hijacking arborist’s HTTP interface to Access Registry directly into our Registry service. The dependency tree generation process is accelerated through memory cache/distributed cache /DB.

By aggregating more than 2,000 HTTP requests into one, we reduced the dependency tree generation time from 15 seconds to 3 seconds, reducing the time by 80%.

The middle layer

By using an intermediate layer to reduce IO, we will take off with NPM installation speed.

Using Strace first, we can see that NPM called Write Syscall 200,000 times during the installation. The average call time was 76 μs and the total usage was more than 15 seconds. 200,000 is the number we need to solve.

There are two reasons why there are so many writes. One is too much dependence. As NPM is the largest ecosystem, too much dependence is an unsolvable problem.

CNPM/PNPM uses soft chain to avoid repeated dependent writes, which reduces part of the write volume. But dependencies are becoming more complex, and optimizations that worked in the past are no longer as useful. We need more effective tools.

Going deeper, why 2,000 dependencies might swell to 200,000. Smallfish this NPM package contains 6 files, 5 directories, and 1 bin. You need to call fs.writeFile 6 times, fs.mkdir 5 times, fs.symlink 1 time, chmod 1 time.

The optical JS writeFile is called 6 times, and the underlying write Syscall is called even more. This API is the one that needs the most optimization.

Further down the road, let’s take a look at the structure of the tar package. The tar package consists of each entry, which contains basic information about the file, such as permissions, size, creation time, filename, and most importantly, the body of the file. Can the file be accessed directly through tar?

In addition, tar has an important feature, append. Tar makes it easy to add files at the end. So we can merge the two tarTs together. The process for writing both packages will become:

  1. Fs. createFile: creates a public file
  2. Fs. appendFile: Writes the first package
  3. Fs. appendFile: Writes the second package

Reduced IO count from 26 to 3.

But one of the most important questions remains. Although download installs quickly now, but installs the thing to be unable to use. Tar is not a JS file at all. You can’t require JS directly, can’t operate in a shell, and can’t edit in an IDE. All habits were broken. Next we need to solve the problem of reading and writing tar.

First, we need to find a suitable layer to solve this problem. The process of requiring a file with JS looks like this.

  1. Make a file read request using the Fs. readFile API.

  2. The JS method constructs the uv_req_t data result in libuv

  3. Libuv calls the read method in liBC.

  4. The read method makes a system call to access the file system in the kernel to read files.

In the community, NPM packages are stored as packages, while PnP packages are stored as ZIP packages. By hijacking Node’s require method to achieve zip package reading, through the development of IDE plug-in way to support reading in IDE. But there are plenty of implementations in the community that traverse node_modules via fsAPI, and developers use shells for dependencies. For the existing use habits of greater damage.

Whether we hijack the JS API or the C API, it is unlikely to maintain the existing way of use.

Any problem in computer science can be solved by adding an indirect intermediate layer.

We have to look at where this intermediate layer is added to solve it. The problem comes to the file system.

FUSE is FileSystem In UserSpace. That is, we can implement a file system in user mode. As shown in the figure, Hello takes over all file system operations in the/TMP /fuse directory. The ls -l/TMP /fuse command in the figure uses file system operations such as’ access ‘, openDir, etc. These operations are performed in Hello. Hello controls what files are in the directory and what the contents of those files are.

So combined with the metadata in tar, we can implement a filesystem through FUSE, keeping all the existing usage habits intact.

Tar has a few drawbacks, though. As mentioned above, it’s easy to add a file to the end of tar, but the node_modules file can also be modified, which is not so easy to do based on tar.

Now that the problem of reading has been solved, we need to solve the problem of writing.

Another technique I’ll cover is overlays, which combine multiple file systems together. For example, if the Lower directory is read-only and the Upper directory is read-write, we can Overlay the Upper and Lower directories together to create a read-write directory. Changes to files are reflected in the Upper directory, not the Lower directory.

So far, the core technologies used by NPM Rapid have been shared. Based on these technologies, we can construct NPM FS. The bottom layer uses TAR as the storage format to support high-speed writing, and the top layer uses Overlay technology to implement reading and writing, keeping the existing usage habits.

Using mid-tier technology, we reduced the time of the underlying file from 30 seconds to 3 seconds by 90% by changing it from a small file to a tar.

The cache

After fixing the speed of the installation, we had to fix the final disk space problem, and now NPM takes up so much space after the installation that the black hole deserves its name.

NPM uses the cache of global tar to speed up the download process and reduce duplicate downloads. But each decompression still takes too much time.

PNPM uses the form of file hard chain to reduce the amount of writing, but hard chain means that the global reference to the same file. For example, two projects depend on the same package. If one of them makes some changes in the debug, the other package will be affected, causing unexpected effects.

Another feature of Overlay mentioned above is Copy On Write (COW). When the underlying file is modified, the underlying file is copied to the upper directory. So we can use the same cache to support all projects globally.

By COW – based caching, we achieve a cache that is global and isolated between projects.

Development path

NPM lights up the JS ecosystem. In this ecosystem, we also give back to CNPM by serving as a mirror station in China and speeding up the installation process of NPM. Now we want to give back to the community once again, to rid NPM of its reputation for being slow and big, and to inject more vitality into the ecology.

In this technical exploration, we first developed NPM FS to speed up the installation process. During this process, we also realized that although the original TNPM soft chain structure improved the installation speed by reducing the number of writes, the soft chain also caused a lot of community package incompatibility problems. NPM FS has solved the problem of writing, so we can go back to NPM’s directory structure and reduce the cost of community compatibility.

In the next step, we will support NPM Workspace so that the functionality of NPM Rapid mode can be fully aligned with NPM. Hopefully, NPM Rapid will one day be integrated into NPM so that all JS developers can enjoy its speed.