Moment For Technology

Deep thoughts on modern package managers -- why do I now recommend PNPM over NPM/YARN?

Posted on Oct. 4, 2022, 1:39 p.m. by Maria Harvey
Category: The front end Tag: The front end

This article shares with you an outstanding package manager in the industry - PNPM. GitHub currently has a star of 9.8K and is now relatively mature and stable. It is derived from NPM/YARN, but it solves potential bugs in NPM/YARN, and greatly optimizes performance and extends usage scenarios. Here is the mind map for this article:

What is PNPM?

The official PNPM document says this:

Fast, disk space efficient package manager

Therefore, PNPM is essentially a package manager, no different from NPM/YARN, but it has two advantages as a killer app:

  • Package installation speed is very fast;
  • Very efficient use of disk space.

It is also very simple to install. How simple can it be?

npm i -g pnpm
Copy the code

Second, feature overview

1. The speed is fast

How fast can PNPM install packages? The React package is used as an example.

It can be seen that PNPM, as the yellow part, can be installed 2-3 times faster than NPM/YARN in most scenarios.

Some students who are familiar with Yarn may say, Doesn't YARN have PnP installation mode? Remove node_modules and write dependency packages to disk to save NODE file I/O overhead and speed up installation. (See this article for details)

Next, let's take a look at the benchmark data of PNPM and YARN PnP:

In general, the package installation speed of PNPM is significantly faster than yarn PnP.

2. Use disk space efficiently

PNPM internally uses a content-addressable file system to store all files on disk. This file system is great for:

  • The same package will not be installed twice. With NPM/YARN, if 100 projects depend on LoDash, it is likely that LoDash will be installed 100 times and written in 100 places on disk. However, PNPM will only be installed once, there is only one place in the disk to write, and then use hardlink directly (hardlink, unclear students see this article).

  • Even with different versions of a package, PNPM greatly reuses code from previous versions. For example, if loDash has 100 files and one more file is added after the update, instead of rewriting 101 files to disk, hardLink keeps the original 100 files and only writes to that new file.

3. Support monorepo

With the increasing complexity of front-end engineering, more and more projects are using MonorePO. Monorepo uses git repositories to manage multiple projects, but monorepo uses a git repository to manage multiple sub-projects. All sub-projects are stored in the packages directory of the root directory, so a sub-project represents a package. If you haven't been exposed to the concept of Monorepo before, take a closer look at this article and lerna, the open source MonorePO management tool, as well as the Babel repository for the project directory structure.

Another big difference between PNPM and NPM/YARN is that PNPM supports monorepo, which is reflected in the function of each subcommand. For example, PNPM add a-r in the root directory, then the dependency A will be added to all packages. The --filter field is also supported to filter the package.

4. High security

Before using NPM/YARN, due to the flat structure of node_module, if A depends on B and B depends on C, C can be directly used in A. However, THE dependency C is not declared in A. That's why you get this kind of illegal access. However, PNPM has great imagination and created a set of dependency management mode, which solves this problem well and ensures security. How to reflect security and avoid the risk of illegal access dependency will be discussed in detail later.

Dependency management

NPM/yarn install principle

There are two main parts. First, how to get the package to the project node_modules after NPM/YARN install is executed. Second, how dependencies are managed inside node_modules.

After executing the command, the dependency tree is first built, and then the packages under each node go through the following four steps:

    1. Resolves the version range of a dependent package to a specific version number
    1. Download the tar package dependent on the corresponding version to the local offline image
    1. Decompress dependencies from offline images to the local cache
    1. Copies dependencies from the cache to the node_modules directory of the current directory

The corresponding package then goes to the project's node_modules.

So, what is the directory structure of these dependencies inside node_modules, in other words, what is the dependency tree of the project?

In NPM1, nPM2, there is a nested structure, such as the following:

├─ trash ├─ ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trash ├─ trashCopy the code

If there are dependencies in the BAR, then nesting continues. Consider the problem with such a design:

  1. If the dependency level is too deep, the file path is too long, especially on Windows.
  2. A large number of duplicate packages are installed and the file size is extremely large. Such as withfooThere's one in the same directorybazBoth depend on the same versionlodashThen loDash will be installed in both node_modules, which means it will be installed repeatedly.
  3. Module instances cannot be shared. React, for example, has some internal variables. React introduced in two different packages is not the same module instance, so internal variables cannot be shared, resulting in some unpredictable bugs.

Then, starting with NPM3, including YARN, started to address this problem by flattening dependencies. I'm sure you all have this experience. I just installed express, why there are so many things in node_modules?

Yes, this is the result of flat dependency management. Instead of the previous nested structure, the directory structure now looks like this:

Node_modules ├ ─ foo | ├ ─ index. Js | └ ─ package. The json └ ─ bar ├ ─ index. The js └ ─ package. The jsonCopy the code

All dependencies are flattened to node_modules and no longer have deep nesting. In this way, when installing a new package, node_modules will be searched continuously according to the Node require mechanism. If the same version of the package is found, the package will not be re-installed, which solves the problem of repeated installation of a large number of packages, and the dependency level is not too deep.

The previous problem is solved, but if you think about this flat treatment, is it really invulnerable? And it isn't. It still has many problems, comb through:

    1. Dependent structure uncertainty.
    1. Flattening algorithm itself is very complex and time-consuming.
    1. Packages with undeclared dependencies can still be accessed illegally in a project

The last two are easy to understand. What does uncertainty mean in the first point? Here's a little bit more detail.

If the project now relies on two packages foo and bar, the dependencies of these two packages look like this:

NPM/YARN install is flattened

Or is it?

The answer is: both. Depending on where foo and bar are in package.json, if foo is declared first, it is the first structure, otherwise it is the second.

This is why dependency structure uncertainty arises, and why lock files are created, whether package-lock.json(NPM 5.x only) or yarn.lock, to ensure that certain node_modules structures are generated after install.

However, NPM/YARN itself still suffers from complex flattening algorithms and illegal package access, which affects performance and security.

PNPM dependency management

PNPM author Zoltan Kochan found that YARN did not intend to solve these problems, so he started from scratch, wrote a new package manager, and created a new dependency management mechanism. Now let's find out.

To install Express, create a new directory and execute:

pnpm init -y
Copy the code

Then execute:

pnpm install express
Copy the code

Let's go to node_modules again:

.pnpm
.modules.yaml
express
Copy the code

We can see express directly, but it's worth noting that this is just a soft link. If you open it, there is no node_modules directory. If it is a real file location, it will not be found because of node's package loading mechanism. So where is it really located?

We continue our search in.pnpm:

▾ node_modules ▾. PNPM ▸ [email protected] ▾... ▾ [email protected] ▾ node_modules ▸ Accepts port array-flatten man-body -parser malent-disposition... ▸ etag ▾ Express ▸ lib history.md index.js LICENSE package.json readme.mdCopy the code

Boy! PNPM /[email protected]/node_modules/express!

Open a random other bag:

@version/node_modules/ And express dependencies are under. PNPM /[email protected]/node_modules, and these dependencies are also soft links.

PNPM,.pnpm directory although present is a flat directory structure, but think carefully, along the soft link slowly unfolded, is actually nested structure!

▾ node_modules ▾. PNPM ▸ [email protected] ▾... ▾ [email protected] ▾ node_modules ▸ accepts -.. /[email protected]/node_modules/ Accepts ▸ array-flatten -.. / [email protected] / node_modules/array - flatten... ▾ Express ▸ lib history.md index.js LICENSE package.json readme.mdCopy the code

Putting the package itself and dependencies under the same node_module is a nice design that is fully compatible with native Nodes, while organizing the package with related dependencies nicely.

Now we can look back and see that node_modules in the root directory no longer has a dizzying array of dependencies, but is basically the same as package.json declared dependencies. Even though there are some packages inside PNPM that have dependencies promoted to the root node_modules, the root node_modules are much cleaner and more regulated than before.

Talk about security

In case you haven't noticed, PNPM's approach to dependency management is also a great way to avoid the problem of illegal access dependencies, which means that a package cannot be accessed in a project unless it has a dependency declared in package.json.

If A depends on B, and B depends on C, then A does not declare C's dependency, but C is installed in NODE_modules of A because of dependency promotion, then I use C in A. It works fine, and it works fine when I get online. Isn't it safe?

Not really.

First, you should know that the version of B is subject to change at any time. If the version of B used to be [email protected], and now A new version is issued, the new version of B depends on [email protected], then after NPM/YARN install in project A, 2.0.1 version of C is installed. A uses the old version of THE API in C, which may be directly error.

Second, if C is not needed after B is updated, C will not be installed in node_modules when dependencies are installed.

In another case, in the Monorepo project, if A depends on X, B depends on X, and there's A C that doesn't depend on X, but uses X in its code. Due to dependency promotion, NPM/YARN puts X into node_modules in the root directory, so that C can run locally, because according to the package loading mechanism of Node, It can be loaded into X in node_modules at the root of the Monorepo project. But imagine that once C packages out separately and the user installs C separately, X is nowhere to be found and an error is reported when code that references X is executed.

These are potential bugs of dependency promotion. It's fine if it's your own business code, but think about it if it's a toolkit used by many developers.

NPM has tried to fix this problem by specifying --global-style to prevent variable promotion, but this would be a throwback to the days of nested dependencies. Overnight, the disadvantages of nested dependencies are still exposed.

NPM/YARN itself may seem difficult to solve, but the community has a specific solution to this problem: dependency check, github.com/dependency-...

But there is no denying that PNPM goes further, with an original dependency management approach that not only solves the security problem of dependency promotion, but also greatly optimizes performance in time and space.

Daily use

Having said all that, YOU may think that PNPM is quite complicated. Is it expensive to use?

On the contrary, PNPM is very simple to use, and if you have previous experience with NPM/YARN, you can even migrate to PNPM seamlessly. Let's give some examples of everyday use.

pnpm install

Similar to NPM install, install all dependencies under the project. For the Monorepo project, however, all dependencies for all packages under workspace are installed. However, packages can be specified with the --filter argument, and only packages that meet the criteria will be dependent installed.

Of course, it can also be used to install a single package:

/ / install axios
pnpm install axios
Install axios and add axios to devDependencies
pnpm install axios -D
Install axios and add axios to Dependencies
pnpm install axios -S
Copy the code

Of course, the package can also be specified by --filter.

pnpm update

Update the package to the latest version according to the specified scope, and the package can be specified in the Monorepo project with --filter.

pnpm uninstall

Remove the specified dependencies in node_modules and package.json. Same for monorepo. Examples are as follows:

// remove axios PNPM uninstall axios --filter package-aCopy the code

pnpm link

Connect a local project to another project. Note that hard links are used, not soft links. Such as:

pnpm link .. /.. /axiosCopy the code

PNPM run/start/test/publish For more information, please refer to the official documentation: pnpm.js.org/en/

As you can see, although PNPM has a lot of complicated design inside, it is actually imperceptive to the user and very user-friendly. In addition, the author is still maintaining it. NPM has been downloaded 10W + last week, which has experienced the test of large-scale users, and its stability can be guaranteed.

Therefore, I feel that PNPM is a better solution than NPM/YARN, both in terms of the underlying security and performance, and the mental cost of using it.

References:

[1] PNPM official documentation: pnpm.js.org/en/

[2] Benchmark Repository: github.com/dependency-...

[3] Zoltan Kochan "Why Should We Use PNPM?" : www.kochan.io/nodejs/why-...

[4] Zoltan Kochan the PNPM 's strictness else to get silly bugs ": www.kochan.io/nodejs/pnpm...

[5] Conarli the NPM install principle analysis: cloud.tencent.com/developer/a...

[6] yarn official documentation: classic.yarnpkg.com/en/docs

[7] the Yarn plug-in 'n' Play features ": loveky. Making. IO / 2019/02/11 /...

[8] Guide to Monorepos for Front-end Code: www.toptal.com/front-end/g...

Search
About
mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.