npm & yarn

As the official package management tool for Node, NPM appeared with the birth of Node, and also brought a flourishing front-end community. The community was so active that IT contributed many wheels to NPM. Briefly, NPM is divided into front-end sites, NPM databases, and NPM-CLI. The following discussion of NPM refers to the NPM-CLI part.

Early NPM

The design concept of NPM is mainly based on semantic version (SEMver), which solves the problem of dependency management. The definition of dependency version number consists of three parts: major.minor.patch, and each part has the following meanings:

  • MAJOR: When you make incompatible API changes.
  • MINOR: When you make a backward-compatible feature addition,
  • PATCH: When you make a retro-compatible problem fix.

For example, when we run NPM I lodash –save, lodash will be added to package.json as a dependency:

"dependencies": {

  "loadsh": "^ 4.17.4"

}
Copy the code

^ tells NPM that when installing a dependency, NPM will install the latest major version 4 of the LoDash dependency, such as [email protected]. NPM allows this installation in the belief that the dependent authors are fully compliant with the semantic versioning specification, meaning that the latest version of loDash’s API from major version 4 is compatible with older versions of the API.

Therefore, when we develop NPM package, we should fully abide by the provisions of the agreement.

Early NPM installations did not guarantee the stability of the dependency tree, nor did all NPM package authors adhere to the semantic versioning protocol. So it’s possible that the same PakCage. json is tree dependent but unstable.

While earlier NPM features shrinkwrap to keep dependency trees stable, they are not enabled by default.

In addition, when NPM installs dependency trees, since the dependencies in NPM will depend on other dependencies, NPM’s early strategy was to install the entire dependency tree exactly according to the directory structure. The advantages of this dependency tree are clear and easy to understand, but there are many disadvantages, one of which is “dependency black hole” :

Because dependency trees are generated entirely from dependent dependencies, many duplicate dependencies are repeatedly installed:

In the figure, dependency A is dependent on other dependencies. In theory, the same dependency A needs to be installed only once, but in NPM’s policy, the same dependency A is installed three or more times.

In addition to the large number of duplicate dependencies that are installed and waste disk space, there are two other issues:

  1. If the dependency level is too deep, the file path is too long. Especially under Windows, many programs cannot handle more than 260 characters.
  2. Instances cannot be shared, and some programs require only one instance at run time. You cannot share the same code because you have multiple dependencies installed

Depend on the draw

NPM implements dependency tree leveling in version 3 to address the problem of dependency duplication.

The same dependency will be installed repeatedly:

Now the same dependency is promoted to the top level:

Why can we do that? It all starts with node’s require mechanism. When we run require(” XXX “) to introduce an external module, there are two situations:

  • ifxxxIs a Node core module, for examplefs,httpSo return to the Node core module.
  • If not, then the current judgment will be judgednode_modulesIf the folder has this module, return if so, if not recurse to the upper levelnode_modulesDirectory search, if found on the return, if to the root directory did not find an error.

Node_modules is the external dependent directory that is copied by default when NPM is installed.

Node_modules = node_modules = node_modules = node_modules = node_modules = node_modules = node_modules = node_modules = node_modules = node_modules = node_modules

And that sort of solves our problem, but it doesn’t and it introduces the bigger problem.

If C and B depend on the latest version of A ([email protected]) at the same time, it will still install [email protected] twice.

Second, this may lead to an unstable dependency tree. If we have two versions of dependency A, [email protected] and [email protected], which are dependent on B and C, respectively,

So which dependencies will NPM promote?

In fact, neither does NPM, depending on the order in which they declare in package.json.

Finally, this can lead to the problem of “ghost dependency”. In our package.json only dependency B is declared, but B depends on A.

Dependencies: {" B ", "^" 1.0.0,}Copy the code

However, we can refer to A in our own code, maybe there is no problem in most cases, but if the dependent B upgrades the dependent version of A, and then A has some break change, using A in our code may cause an error.

As of this NPM V3, the issues we know are:

  1. Dependency reinstallation problem. It was solved, but not completely.
  2. Dependency tree instability problem.
  3. The instance cannot be shared.
  4. Ghost dependency issues.

What does Yarn bring

Once YARN was released, it was warmly welcomed by the community. People had been suffering from NPM for a long time. This also promoted the update of NPM and realized some yarn functions.

The following are some features advertised when YARN was first released:

  • Flat mode.
  • Certainty.
  • Offline mode.
  • Network performance.
  • Download retry mechanism

Flat patterns and determinism

Flat mode, like NPM V3, has a flat structure to manage dependencies, but how does it guarantee the certainty of the dependency tree structure?

Yarn generates a yarn.lock file by default during installation. Subsequently, NPM also learns YARN, and package-lock.json is generated by default during each installation.

Yarn. lock and package-lock.json are similar in that they are used to determine the structure of the dependency tree; The difference is that yarn.lock is in yamL format while package-lock.json is in JSON format.

In addition, the version of child dependencies in the yarn structure is not locked, so yarn needs to combine yarn.lock and package.json to determine the version of the dependency tree:

Json, which means that NPM can build a complete dependency tree directly from package-lock.json. In fact, NPM was originally prepared to install dependencies only with package-lock.json. This was definitely problematic, so now NPM will also combine package.json to determine the dependency tree.

Offline mode

In addition, YARN supports offline modes that NPM does not support. However, NPM supports this feature later. Offline mode means using dependencies directly in the global cache instead of network requests. NPM will cache a copy of the dependency in the local cache directory. Run NPM config get cache to view it.

Here are the download strategies currently supported by NPM:

  • Default policy: always initiate a network request, return 304 to read from the local cache, return 200 to download from the network, and update the local cache.
  • --prefer-offlineIf you can’t find it locally, you go to the network.
  • --prefer-onlineRequests are always made from the network, and only when the network request fails is it fetched from the local cache.
  • --offlineThe local cache is forced to use. If the local cache cannot be found, an error message is displayed and the system exits.

Security check

As shown in the lock file above, each dependent tar has a hash to verify file integrity.

PnP

Yarn has many features to promote the development of NPM. For example, it can download and parse multiple packages at the same time to improve the download speed. After yarn1.0 achieved faster experience of dependency installation than NPM, YARN still has not completely solved the problems mentioned above, such as repeated dependency installation, failure to share instances and ghost dependencies caused by repeated dependency installation. But it does improve installation speed and experience.

Here comes yarn2.0 — PnP, PnP is a YARN Future, not to be confused with PNPM. The idea is simple: because of node’s require mechanism, we have to use a file directory structure with nested dependencies, so it fundamentally rewrites the require mechanism.

Because YARN already knows everything about the dependency tree during installation, PnP directly returns the corresponding module to Node during node parsing. Instead of generating a node_modules structure for Node require to find its own dependencies, I’ll tell you where it is! His basic principle is to rewrite node’s file lookup mechanism.

This has a number of benefits:

  • It directly saves the time of generating node_modules file structure trees, which can cause many system file IO operations
  • Prevent ghost dependencies, as dependencies that are not in a dependency are never referenced. Some global dependencies, for example, we might misreference.
  • At run time, Node does not recursively constantly go to the node_moduels folder to find dependencies.
  • To prevent the installation of duplicate dependencies, now only one copy of all the same dependencies will be installed and PnP will help you reference them at run time. Ghost dependencies and instances that cannot be shared are fundamentally prevented.

The problem of PnP

Is it really that perfect? It’s perfect in theory.

Node_modules: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules Webpack, typescript, vsCode, etc have implemented their own module parsing code, so these tools need additional plugin, such as PNP-webpack-plugin to change the rules of Webpack module parsing. Here is the official compatibility table. Compatible means incompatible, there are official incompatible forms, and there may be other third-party library compatibility issues that are not officially detected.

pnpm

PNPM is the perfect solution to the problem of relying on repeated installations and implements all of YARN’s best features. However, different from PnP’s radical approach to solving problems, PNPM’s approach to solving problems is through soft and hard chains, and there is no difference in the use of PNPM. PNPM also tries to reuse dependencies.

Connection is introduced

Hard to connect

Hard connection refers to connecting through index nodes. In Linux file systems, files stored in disk partitions are assigned a number, called the Inode Index, regardless of their type. In Linux, it is possible to have multiple file names pointing to the same index node. In general, this connection is a hard connection. The purpose of hard links is to allow a file to have multiple valid pathnames, so that users can establish hard links to important files to prevent “accidental deletion” function. The reason for this is as described above, because there is more than one connection to the index node that should be the directory. Deleting only one connection does not affect the index node itself or other connections, and only when the last connection is deleted will the file’s data blocks and directory connections be released. In other words, a file can only be deleted if all hardlinked files associated with it are deleted.

Soft connection

The other type of Link is called a Symbolic Link, also known as a soft Link. Soft link files have shortcuts similar to Windows. It’s actually a special file. In symbolic links, a file is actually a text file that contains information about the location of another file.

Install the speed

First take a look at the speed comparison, refer to PNPM to give a benchmark:

You might say, “Every dog sells his own dog.” Maybe you should check out the benchmark provided by Yarn2.

So why install so quickly?

This is the installation process of NPM & YARN:

  • Resolving. First they parse the dependency tree and decide which packages to fetch.
  • Fetching. Install tar packages to fetch dependencies. Multiple downloads can be made at this stage to increase speed.
  • Wrting. Then unpack the package and build the real dependency tree from the files, which requires a lot of file IO operations.

Above is the installation process for PNPM. You can see that the three processes for each package are parallel, so it is much faster. Of course PNPM has one more stage, which is to organize the true dependency tree directory structure through links.

Why does PNPM install faster than PnP?

PNPM directory structure analysis

To understand PNPM thoroughly, you need to understand the directory structure on which PNPM depends.

The basic structure

Soft link (symbolic link) -> indicates, hard link -> indicates

Suppose our project relies on [email protected] and [email protected] depends on [email protected].

First PNPM hardlinks to the.pnpm directory from the global store (or the specified store). Such as:

Node_modules └ ─ ─ the PNPM ├ ─ ─bar@1.0. 0│ └ ─ ─ node_modules │ └ ─ ─ bar - > < store > / bar │ ├ ─ ─ index. The js │ └ ─ ─packageThe json └ ─ ─foo@1.0. 0└ ─ ─ node_modules └ ─ ─ foo -- > < store > / foo ├ ─ ─ index. The js └ ─ ─package.json
Copy the code

Bar and foo are hardlinked to files in

respectively. Note here that we actually put bar under [email protected]/node_modules/, which has two benefits:

  1. Bar can require itself. For example,require(bar/package.json).
  2. Prevent circular soft links. If you put your own dependencies under node_modules, some packages will read files in their own node_modules, causing cyclic soft links. See discussion for details.

PNPM then links dependent dependencies through soft links. [email protected] depends on [email protected], bar will be linked to the [email protected]/node_modules directory:

├── PNPM ├─@ 1.0.0│ ├ ─ ├ ─ garbage --><store>/ bar └ ─ ─ foo@ 1.0.0└── ├─ Exercises -><store>/ ├ ─ garbage ->.. /.. /bar@ 1.0.0 / node_modules/bar
Copy the code

Finally, PNPM links the project’s direct dependencies through a soft link. Foo will be linked to node_modules:

Node_modules ├─ foo ->./.pnpm/foo@ 1.0.0 / node_modules/foo└ ─ ─ the PNPM ├ ─ ─ the bar@ 1.0.0│ ├ ─ ├ ─ garbage --><store>/ bar └ ─ ─ foo@ 1.0.0└── ├─ Exercises -><store>/ ├ ─ garbage ->.. /.. /bar@ 1.0.0 / node_modules/bar
Copy the code

If you rely on bar and foo and also need to rely on [email protected], the directory structure looks like this:

Node_modules ├─ foo ->./.pnpm/foo@ 1.0.0 / node_modules/foo└ ─ ─ the PNPM ├ ─ ─ the bar@ 1.0.0│ ├─ ├─ ├─ └ -><store>/bar@ 1.0.0├ ─ garbage ->.. /.. /qar@ 2.0.0 / node_modules/qar├ ─ ─ foo@ 1.0.0│ ├─ ├─ Exercises -><store>/ │ ├─ Bar ->.. /.. /bar@ 1.0.0 / node_modules/bar├ ─ garbage ->.. /.. /qar@ 2.0.0 / node_modules/qar└ ─ ─ qar@ 2.0.0└── imp --><store>/qar
Copy the code

We found that the hierarchy of the entire directory structure is still flat, even though the dependency tree is now much deeper. PNPM transforms flat data structures into tree data structures through soft linking.

When Node runs, if the directory node requires is a soft chain, it will find the actual file location.

With this structure, there will be no dependency problems associated with dependency leveling.

The processing of peerDependency

First explain what peerDependency is. When we declare dependencies in the peerDependency field of package.json, it means that the package expects to require dependencies of the relevant peer at runtime, usually as a plugin package.

Because NPM & YARN adopts the dependency leveling policy, the dependency tree is unpredictable and the dependency structure declared in peerDependency cannot be guaranteed. Therefore, nPMv3, V6 and YARN are not automatically installed (v7 is automatically installed). Warning is generated when versions do not match.

PeerDependencyMeta specifies which dependencies are optional.

Assume that the project relies on foo-parent-1 and foo-parent-2, where foo-parent-1 and foo-parent-2 both depend on foo, bar, baz. Foo has two peers bar@^1 and Baz @^1. But foo-parent-1 and foo-parent-2 rely on different Baz.

- foo-parent-1

  - [email protected]

  - [email protected]

  - [email protected]

- foo-parent-2

  - [email protected]

  - [email protected]

  - [email protected]
Copy the code

If peerDepnecy is not taken into account, it is similar to the basic structure above:

├─ foo@10.0│ ├─ ├─ ├─ ├─ │ ├─ ├─ ├ _.. /.. /qux@1.0.0├ ─ 2-0, ├ ─ 2-0. /.. /plugh@1.0.0/ node_modules/plugh ├ ─ ─ qux @ 1.0.0├ ─ ─ plugh @ 1.0.0
Copy the code

If peerDepency is considered, its directory structure is as follows:

├─ foo@10.0 _bar@1.0.0+baz@1.0.0│ ├─ ├─ ├─ ├─ < class >/[email protected] │ ├─ Bar ->.. /../bar@1.0.0/node_modules/bar

    │       ├── baz   -> ../../baz@1.0.0/node_modules/ Baz │ ├─ Qux ->.. /.. /qux@1.0.0├ ─ 2-0, ├ ─ 2-0. /.. /plugh@1.0.0/ node_modules/plugh ├ ─ ─ foo @ 1.0.0 _bar@1.0.0+baz@1.1.0│ ├─ ├─ ├─ ├─ < class >/[email protected] │ ├─ Bar ->.. /../bar@1.0.0/node_modules/bar

    │       ├── baz   -> ../../baz@1.1.0/node_modules/ Baz │ ├─ Qux ->.. /.. /qux@1.0.0├ ─ 2-0, ├ ─ 2-0. /.. /plugh@1.0.0/ node_modules/plugh ├ ─ ─ bar @ 1.0.0├ ─ ─ baz @ 1.0.0├ ─ ─ baz @ 1.1.0├ ─ ─ qux @ 1.0.0├ ─ ─ plugh @ 1.0.0
Copy the code

Different directory structures are arranged and combined according to the different dependency versions in peerDependency, and the different dependency versions in these structures will be in their node_modules directory. In this example, because there are two versions of baz dependencies that depend on foo (i.e., foo-parent-1 and foo-parent-2), they form two different directories: [email protected]_bar @ 1.0.0 + [email protected] and [email protected]_bar @ 1.0.0 + [email protected]. Similarly, if there are two versions of Baz, there will be four directories. Foo in node_modules in these directories (e.g. [email protected][email protected][email protected]/node_moudles/foo) are hardlinked to /foo.

Finally, PNPM /parent-foo-1/node_modules/foo is actually linked to [email protected][email protected][email protected] (because its baz dependency and bar dependency are 1.0.0).

What if the dependency has peerDependency?

For example, A depends on B, but A does not have peerDependency, but B has peerDependency C, whose dependency tree upstream depends on different versions of C (for example, a-parent-1 depends on [email protected], and a-parent-2 depends on [email protected]). Then the structure is as follows:

├─ a@10.0 _c@1.0.0│ └ ─ ─ node_modules │ ├ ─ ─ a │ └ ─ ─ - >. B. /.. /b@1.0.0 _c@1.0.0/ node_modules/b ├ ─ ─ @ 1 a.0.0 _c@1.1.0│ └ ─ ─ node_modules │ ├ ─ ─ a │ └ ─ ─ - >. B. /.. /b@1.0.0 _c@1.1.0/ node_modules/b ├ ─ ─ @ 1 b.0.0 _c@1.0.0│ └ ─ ─ node_modules │ ├ ─ ─ b │ └ ─ ─ c - >.. /.. /c@1.0.0/ node_modules/c ├ ─ ─ @ 1 b.0.0 _c@1.1.0│ └ ─ ─ node_modules │ ├ ─ ─ b │ └ ─ ─ c - >.. /.. /c@1.1.0/ node_modules/c ├ ─ ─ @ 1 c.0.0├ ─ ─ @ 1 c.1.0
Copy the code

In short, PNPM will adjust the structure of the dependency tree to ensure the accuracy of peerDependency.

monorepo

In fact, the structure of monorePO is similar to that of a normal project, but there are some special points that need to be noted.

A and B are projects under Monorepo respectively. Project A relies on LoDash and @types/ Lodash, while Project B relies on Project A and React. The structure is as follows:

. ├ ─ ─ node_modules │ ├ ─ ─ @ types │ │ └ ─ ─ lodash - >.. /.pnpm/@types+lodash@4.14177./ node_modules / @ types/lodash └ ─ ─ the PNPM ├ ─ ─ package. The json ├ ─ ─ packages │ ├ ─ ─ a │ │ ├ ─ ─ node_modules │ │ │ ├ ─ ─ @ types │ └ ─ ─ │ ├ ─ garbage ->.. /.. /.. /node_modules/.pnpm/lodash@4.1721./ node_modules/lodash │ │ └ ─ ─ package. The json │ └ ─ ─ b │ ├ ─ ─ node_modules │ │ ├ ─ ─ a - >.. /.. /a │ ├ ─ crime-press ->.. /.. /.. /node_modules/.pnpm/react@17.02./ node_modules/react │ └ ─ ─ package. The json ├ ─ ─ PNPM - lock. Yaml └ ─ ─ PNPM - workspace. YamlCopy the code

You can see:

  • All dependencies are tiled in the outermost layernode_modules/.pnpmThis directory.
  • React, which project B relies on, and Lodash and @types/lodash, which project A relies on, all point to folders corresponding to this directory.
  • Through the Worksapce protocol, the project B dependency specifies its dependency A as a of the local project, directly puttingnode_modules/aThe path of the soft chain to project A.

The Worksapce protocol is used to specify dependencies as local files under PNPM Worksapce. At release time, PNPM will automatically change the dependent version using the Worksapce protocol to the version corresponding to the local project, which replaces our frequent link operation in the project. In contrast, yarn workspace does not support the Worksapce protocol. If a dependency declared in a project can be found locally, the local dependency will be used. If not, the online dependency will be used. In contrast, workspace we can manually control whether it is an online dependency or a local dependency.

Dependency A is simply linked to node_modules of B. Assuming a has a declaration of peerDependency (e.g. a declares peerDependency as react), then running a under project B will have problems. Because A can’t find react (if hoist is strictly set to false). I can set this parameter to dependenciesMeta.*. Injected true so that PNPM may address its peerDepency as a legitimate dependency.

The related configuration

In the example in the previous section, there was a serious problem that @types/lodash was promoted to the outermost node_modules directory. This can cause ghost dependency problems. (For example, @types/lodash is not used in project B, but TSC may not report an error)

So why is this? Didn’t PNPM say it would fix ghost dependencies?

To fully understand, we need to understand why PNPM behaves the way it does. Take a look at the pNPM-related default configuration:

hoist=true 



; All packages are hoisted to node_modules/.pnpm/node_modules

hoist-pattern[]=*



; All types are hoisted to the root in order to make TypeScript happy

public-hoist-pattern[]=*types*



; All ESLint-related packages are hoisted to the root as well

public-hoist-pattern[]=*eslint*



shamefully-hoist=false
Copy the code
  • hoist
    • Defaults to true, equivalent to hoist-pattern[]=*.
    • All dependencies will come fromnode_modules/.pnpmLink to anode_modules/.pnpm/node_modulesThe inside. This means that all dependencies in your project can be referenced to all dependencies in your project.
  • hoist-pattern
    • The default is hoist-pattern[]=*.
    • Only dependencies that match the re will follownode_modules/.pnpmLink to anode_modules/.pnpm/node_modulesThe inside.
  • shamefully-hoist
    • The default is false; If true, it equals public-hoist-pattern[]=*.
    • All dependencies are linked to a copy in the outermost node_modules directory. This is likely to cause ghost dependency problems and is not recommended.
  • Public – hoist – the pattern.
    • The default is [‘types‘, ‘eslint‘, ‘@prettier/plugin-*’, ‘prettier-plugin-‘].
    • The one that matches the re will be promoted to the outermost layernode_modulesThe inside.

Using the default publish-hoist-pattern configuration, re matches ts and ESLint-related packages, which are promoted to the outermost node_modules directory, which makes it easier for ts and ESLint engines to resolve dependencies. So PNPM ensures that TS and ESLint work in order to install dependencies for normal projects, so the default configuration allows this behavior. But this behavior does not meet expectations in the monorepo scenario, shamefully hoist can be configured to false in.npMRC or modify the public-hoist pattern configuration. Referring to ISSuse, the PNPM author plans to turn off this default promoted feature (at least in the case of Monorepo) in the next big release.

Don’t worry about this if you’re using rush, because the rush project doesn’t have node_modules in the root directory. Also, you won’t find NPM and PNPM-related configuration files (including package.json,.npmrc, pnpm-lock.yaml, pnpm-workspace. Yaml) in the Rush project because they were all moved under the common/temp directory. Under the PNPM Worksapce organization, you might want to add some common dependencies (toolkits, such as Jest) to node_modules in the root directory, but in theory, as long as you can add dependencies to node_modules in the root directory, ghosts will be created, so rush is strict, Because you never had a chance to add it! Oh! Damn the Node Require mechanic!

In addition to the above example, there is a real example we encountered: a project running under Monorepo will not report an error, but will report an error at compile time. After investigation, it is because TCS-Templates declared Lodash-es as peerDependency, but lodash-es is not installed in the project, so it can run locally. Because the default hoist is true links all dependencies to the.pnpm/node_moudles directory, so TCS -templates’ node_modules can’t find lodash-es, they go to.pnpm/node_modules Since other projects have lodash-es installed, of course require is required. At compile time require is not available because the –filter feature is used and only the dependencies used are installed.

There are two fundamental reasons:

  1. There is no openstrict-peer-dependenciesThere is no strict detection of missing or invalid R dependencies.
  2. Hoist defaults to true, so the dependency has ghost dependencies by default

This behavior is difficult to define because some dependencies use this feature, such as ESLint, Babel, or perhaps we should trust package authors not to make the mistake of having ghost dependencies. Anyway, the solution is to set the hoist to false and then manually set the hoist-pattern whitelist.

Combined with PNP

PNPM can be combined with PNP by configuring Node-Linker:

node-linker=pnp

symlink=false
Copy the code

Note that this becomes very strict and any ghost dependencies will not run.

conclusion

PNPM has significant advantages over NPM and YARN in terms of speed, security and compatibility, so if your project still uses NPM and YARN, you are strongly advised to try PNPM, you will love it. If your project is Monorepo, I strongly recommend migrating to PNPM because ghost dependencies are more prominent under Monorepo, and I recommend Changeset or Rush as package publishing and management tools, rush can be used as a business scenario for large warehouses with multiple people working together.

Reference documentation

PNPM author blog

The PNPM’s official website

NPM Install principle analysis

Deep thoughts on modern package managers — why do I now recommend PNPM over NPM/YARN?

Linux soft links and hard links

PnP