Performant NPM: Performant NPM: Performant NPM: Performant NPM: Performant NPM: Performant NPM: Performant NPM: Performant NPM: Performant NPM

At present, PNPM has been practiced and implemented in many projects within Byte. For example, TikTok FE team in the figure below, the latest version of Monorepo tool developed by our team also uses PNPM as a dependency management tool at the bottom by default.

Compared with yarn/ NPM, PNPM has significantly improved the performance of the two commonly used package management tools. According to the benchmark data provided by the authorities, PNPM is about twice faster than NPM/YARN in some comprehensive scenarios:

In this article, we will introduce some of PNPM’s dependency management optimizations, its use in Monorepo compared to Yarn Workspace, and some of the current drawbacks of PNPM, including some of the things PNPM will do in the future.

Dependency management

This section takes a look at PNPM’s optimization techniques for dependency management that are different from normal package management tools.

Hard link mechanism

The introduction of PNPM is inseparable from the optimization of PNPM’s installation dependencies. The benchmark chart above shows a significant performance improvement.

So how did PNPM achieve such a big boost? This is because of a mechanism in computers called Hard Link, which allows users to find a file using different path references. PNPM stores the hard links of the project node_modules file in the global store directory.

For example, there is a 1MB dependency a in the project. In PNPM, it looks like this dependency takes up 1MB of both the node_modules directory and 1MB of the global store directory (adding up to 2MB). But because the hard Link mechanism allows the same 1MB space in two directories to be addressed from two different locations, the A dependency actually only takes 1MB of space, not 2MB.

The Store directory

In the previous section, the Store directory is used to store dependent hard links, so this section takes a quick look at the Store directory.

${os.homedir}/. Pnpm-store = ${os.homedir}/. Pnpm-store = ${os.homedir}/. Pnpm-store = ${os.homedir}/.

const homedir = os.homedir()
if (await canLinkToSubdir(tempFile, homedir)) {
  await fs.unlink(tempFile)
  // If the project is on the drive on which the OS home directory
  // then the store is placed in the home directory
  return path.join(homedir, relStore, STORE_VERSION)
}
Copy the code

Of course, the user can also set the store directory location in.npmrc, but generally speaking, the store directory is relatively little perceived by the user.

Because of this mechanism, every time a dependency is installed, if it is the same dependency that is used by many projects, it is actually optimal (that is, the same version) to install only once.

If it is NPM or YARN, this dependency is used in multiple projects and is re-downloaded each time it is installed.

If a dependency exists in the sotre directory, the dependency will be hard-linked directly from the store directory to avoid the time consumed by the secondary installation. If the dependency does not exist in the Store directory, the dependency will be hard-linked directly from the store directory to avoid the time consumed by the secondary installation. I’ll download it once.

Of course, you might have a question here: If you install many, many different dependencies, does the Store directory get bigger and bigger?

The answer is, of course, there are, in view of this problem, the PNPM provides a command to solve this problem: PNPM store | PNPM.

This command also provides an option to use the PNPM store prune, which provides a way to remove packages that are not referenced by the global project. For example, [email protected] is referenced by a project. However, some modification caused the package to be updated to 1.0.1 in the project, so the 1.0.0 axios in the store became an unreferenced package, which can be removed from the store by executing the PNPM Store prune.

It is recommended to use this command occasionally, but not too often, because one day the unreferenced package may be referenced by a project, so you don’t have to download it again.

Node_modules structure

In PNPM website has a very classic article, introduce about PNPM node_modules structure: the Flat node_modules is not the only way | PNPM.

In this article, we introduced some of the file structures for node_modules in PNPM. For example, if you install a dependency called Express in your project using PNPM, you end up with two directory structures in node_modules:

node_modules/express/... Node_modules/PNPM/[email protected] / node_modules/XXXCopy the code

Node_modules: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules:

▾ express
    ▸ lib
      History.md
      index.js
      LICENSE
      package.json
      Readme.md
Copy the code

In fact, this file is just a soft link, forming a soft link to a second directory (similar to a software shortcut) so that Node will eventually find the contents of the.pnpm directory while searching for a path.

PNPM is a virtual disk directory, and some of the express dependencies will be tiled into the.pnpm/[email protected]/node_modules/ directory, so that the dependencies can require, It also does not create a deep level of dependency.

In addition to ensuring that NodeJS can find dependency paths, it also largely ensures that dependencies are kept together.

PNPM has extremely strict distinction requirements for dependencies of different versions. If a dependency in a project actually depends on peerDeps in a specific version, there will be a strict distinction in the virtual disk directory. PNPM for details, please refer to: PNPM. IO/how peers – a… This article.

In summary, PNPM’s node_modules structure is essentially a mesh + tiled directory structure. This dependency structure is mainly based on the soft link (symlink) approach to complete.

Symlink and Hard Link mechanisms

PNPM uses hardlink to store node_modules in the hardlink global directory. Then reference dependency is through symlink to find the corresponding virtual disk directory (.pnpm directory) dependency address.

When the two work together, if a project relies on [email protected] and [email protected], the resulting node_modules structure might look something like this:

├ ─ imp // sci-imp/sci-imp/sci-imp/sci-imp/sci-imp/sci-imp/sci-imp/sci-imp/sci-imp/sci-imp/sci-imp/sci-imp / The PNPM/[email protected] / node_modules/foo └ ─ ─ the PNPM ├ ─ ─ [email protected] │ └ ─ ─ node_modules │ └ ─ ─ bar - > < store > / bar │ ├ ─ ─ index. The js │ └ ─ ─ package. Json └ ─ ─ [email protected] └ ─ ─ node_modules └ ─ ─ foo - > < store > / foo ├ ─ ─ index. The js └ ─ ─ package. The jsonCopy the code

The bar and foo directories in node_modules are soft-wired to real dependencies in the.pnpm directory, which are stored in the global store directory via hard link.

compatibility

Are methods like Hard Link and Symlink compatible on all systems?

In fact, hard Link can be used on mainstream systems (Unix/Win) without problems, but symlink (soft connection mode) may have some compatibility problems on Windows, but PNPM also provides a corresponding solution to this problem:

Use a feature called “Soft connection” for Windows, which is more compatible than Symlink.

You may also wonder why PNPM uses Hard Links instead of symLink for all of its implementation.

In fact, dependencies in the store directory can also be found via soft links. Nodejs itself provides an argument called — preserve-Symlinks to support Symlink. But the fact that this parameter does not actually support Symlink well caused the author to abandon the scheme and use hard Links instead:

For details, see github.com/nodejs/node… The issue discussion.

Monorepo support

PNPM is a perfect solution in the Monorepo scenario, because of its design mechanism, many critical or fatal problems are solved quite effectively.

The workspace support

For monorepo type projects, PNPM provides workspace support. For details, see pnpm. IO /workspaces/…

Pain points to solve

Monorepo has been criticized for more problems, which are generally dependent structure problems. Two common problems are Phantom Dependencies and NPM Doppelgangers, which can be best illustrated in the rush image:

The following two questions will be introduced one by one.

Phantom dependencies

Phantom dependencies are called Phantom dependencies and are simply explained as a package that is not installed (not in package.json, but the user can reference the package).

This is usually caused by the node_modules structure, such as using YARN to install dependencies on a project that have a dependency called foo, which also depends on bar, Yarn does a flattening of the installed node_modules (as it did after NPM V3), flattening dependencies under node_modules so that foo and bar appear at the same level. So according to nodeJS ‘path-finding principle, the user can require foo as well as require bar.

Package. json -> foo(bar is a foo dependency) node_modules /foo /bar -> 👻 dependencyCopy the code

So the bar here is a ghost dependency, and if some version of Foo’s dependency no longer depends on bar or the version of Foo changes, then the module part of the require bar will be thrown wrong.

In fact, the above is only a simple example, but according to some monorePO (lerna + YARN) projects I have seen in bytes, this is actually a common phenomenon, and some packages will directly use this kind of fragmentation introduction to reduce the package size.

Another scenario is in lerna + YARN Workspace projects, because yarn provides the hoist mechanism (that is, the dependencies of some of the lower subprojects are promoted to the top node_modules), More phantom dependencies. Low-level subprojects require dependencies that are not added to node_modules, and use the dependencies on top node_modules.

PNPM: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules

It’s worth noting that PNPM itself also provides the Option to elevate dependencies and organize them in a YARN style with node_modules structure, which the authors name –shamefully hoist, i.e., “hoist of shame “…..

NPM doppelgangers

Hoist may cause a large number of dependencies to be repeatedly installed. For example:

For example, if you have a package with lib_a, lib_b, lib_c, lib_d dependencies, a and B depend on [email protected], and c and D depend on [email protected].

The early NPM dependency structure would look like this:

- package-package. json - node_modules-lib_a - node_modules <- [email protected] - lib_b - node_modules <- [email protected] _ Lib_c-node_modules <- [email protected] - lib_d-node_modules <- [email protected]Copy the code

This inevitably results in a lot of dependencies being repeatedly installed, hence the hoist and flat dependency operations:

Json - node_modules - [email protected] - lib_A - lib_b _ lib_c - node_modules <- [email protected] - lib_d - Node_modules < - [email protected]Copy the code

However, only one dependency can be promoted in this way. If both dependencies are promoted, conflicts will occur. In this way, some dependencies of different versions will be installed repeatedly, which will cause performance loss when using NPM and YARN.

In PNPM, a dependency is always stored in hard links in the store directory. A different dependency is always installed only once, so it can be completely eliminated.

The current scenario is not applicable

As mentioned earlier, the main problem with PNPM is that symlink(soft link) has compatibility problems in some scenarios. Please refer to discussion: github.com/nodejs/node…

In it, the author mentioned some scenarios where nodeJS soft connection is not applicable at present, hoping nodeJS can provide a link mode instead of using soft connection, and also mentioned the scenarios where PNPM cannot be used because of soft connection at present:

  • Electron application cannot use PNPM
  • Applications deployed on lambda cannot use PNPM

When using PNPM inside bytes, I have encountered some cases where the nodeJS base library does not support Symlink and PNPM does not work, but these libraries will support this feature after iterative updates.

Some of the things we’re going to do in the future

From nodejs

For details, see github.com/pnpm/pnpm/d…

  • PNPM can be installed without the NodeJS runtime.
  • You can use different versions of NodeJS via PNPM to do dependency installations, similar to what NVM provides.

The feature is currently in beta at www.npmjs.com/package/@pn… This package. To manage different versions of nodejs functions, see the env subcommand: pnpm. IO /cli/env

Write some modules with Rust

For details, see github.com/pnpm/pnpm/d… What this discussion discusses is that the author probably wants to provide rust CLI wrappers for some of the PNPM subcommands to improve performance.

So far this has not made great progress, but I still like the author’s idea. The author’s response to this is “if this PNPM doesn’t do it, then other tools will do it, and finally PNPM will be eliminated”.

The author is still in the process of learning rust. For details about the warehouse address of CLI Rust Wrapper, please refer to github.com/pnpm/pn.

conclusion

Currently, monorepo tools based on PNPM for dependency management, such as Rush, have been widely practiced in the open source community, and monorepo tools developed by our group within Byte are also based on PNPM as a dependency management tool, which has been implemented in a large number of projects.

PNPM as a “rising star” in the package manager, through the author’s unique design scheme, perfect solve many existing package management tools NPM, YARN and node_modules design reasons left pain points. At the same time, the author himself is also very aggressive and makes efforts to improve the PNPM feature and plan the future development direction, looking forward to better and better in the future