preface

Why NPM?

Answering this question reminds me of the early days of learning the front end, before using NPM, when trying to use a mature third-party library would be to download code from BootCDN and put it into your wild projects. When a project is complex, manually managing dependencies becomes a cumbersome and error-prone business.

NPM was born out of Node.js, a project that started in 2009, the same year CommonJS emerged. CommonJS attempts to define a set of apis for common applications to build a JavaScript ecosystem outside of the browser environment. It includes specification definitions for modules, packages, systems, and so on, filling in the gaps of the JavaScript standard library.

When Node.js was launched in late 2009, it attracted an interesting group of people to early node.js development, including Yahoo employee Isaac Schlueter, who saw a big future for package management tools in Node.js. Therefore, I resigned to concentrate on the development of package management tools, and deeply participated in node.js development. Based on the modularization of CommomJs implemented by Node.js, I realized the package mechanism by using this specification. The rest of the story is already well known, because NPM is good enough to be bundled with the Node installation package and become the official brand of Node certification.

NPM is similar to many operating system package managers, providing developers with easy operations such as uploading packages, viewing packages, downloading packages, and managing packages. NPM – CLI as a client, build a huge code warehouse, commonly known as Registry as a database, store software package information, input NPM script command into the interface call, add, delete, check and change the package information in the registry, check the package information pointed to the warehouse site, download the relevant software package, decompress the software package. The package.json defined in the package specification can be interpreted as interface field-specific definitions.

Package mechanism

The package mechanism is the basis of NPM, which includes package structure specification, package description file and packet issuing specification. The so-called package is actually a collection of modules, which are further abstracted on the basis of modules.

Package specification

The package structure that fully complies with the CommonJS specification is as follows:

Bin - Store executable binaries lib - Store Javascript code doc - Store documents test - Store unit test cases package.json - Package description fileCopy the code

In fact, Node.js is not that strict about packages, as long as there is package.json in the top-level directory and it complies with some specifications.

NPM provides the command NPM init -y to quickly generate a default packgae.json file in the current directory to make it a qualified package:

Package description file

Package. json is a file used to describe packages specified by CommonJS. The entire NPM system depends on this file.

If you want to understand the specific field meaning directions article juejin.cn/post/684490…

NPM dependency management

NPM’s job is to help developers manage dependencies. Before we look at dependency management, let’s look at what dependencies can be:

NPM install can be followed by a repository address, a Github project, a URL to parse gzip, even a local folder, etc. Of course, the main form of development we see is “package name + package version” to declare dependencies.

The NPM ls command displays the version information of all packages in the dependency tree of the repository.

SemVer specification

SemVer, short for Semantic Version, is a unified Semantic Version number representation rule drafted by Github and adopted by most software libraries today, of which NPM is one.

SemVer specification official website: semver.org/

NPM releases follow the SemVer release specification as follows:

  1. If the version is a standard version, the version number must be in the X.Y.Z format, where X is the major version, Y is the minor version, and Z is the revision number:

    • Major: When you make incompatible API changes
    • Minor: When you make a downward compatible functional increment
    • Patch: When you make a retro-compatible problem fix
  2. If the version change is large enough to temporarily fail to meet compatibility expectations, an earlier version can be released first, and the earlier version number is marked after the revised version number:

    • X.y.z-alpha: private version
    • X.y.z-beta: public test version
    • X.y.z-rc: indicates the official version candidate

Version management

Fixed version numbers are easy to understand, but the versions we see in projects are often prefixed with symbols such as ^ and ~. This is because NPM sets rules that make it easy to update packages to the latest version when NPM update is performed.

@ ^

Only updates that do not change the non-zero digits on the left are performed. For example, “redux”: “^3.7.2 “. Each time you perform an update, the major version cannot be updated. Therefore, the latest version in 3.y.z is updated. “React-swipeable -views”: “^0.13.3” will be updated to the highest version under 0.13.z.

@ ~

Only the revised version is updated. Major and minor versions are not allowed to be updated. For example, “react-svg”: “~ 10.0.8”. The latest version can only be 10.0.Z.

@ x

X is either of the X.Y.Z values, indicating that the location can be updated.

Dependency management

NPM provides several fields for management dependencies:

  • dependencies
  • devDependencies
  • peerDependencies
  • optionalDependencies
  • bundledDependencies
Dependencies and devDependencies

Dependencies specifies dependencies in the production environment and devDependencies in the development environment. In the NPM world, everyone is both a package developer and a package user.

NPM install installs dependencies and devDependencies from package.json. In addition to the package under development, other third-party packages used in the development process are only used at runtime, that is, production dependencies. NPM will only chain download the dependencies specified in the dependencies field under package.json.

Dependencies and peerDependencies

PeerDependencies declare the dependencies of the current package to the host environment. They are often used in plug-in systems. In contrast to peerDependencies, when you declare a dependency in peerDependencies, NPM does not download it directly. Instead, NPM checks whether the host environment has a package that matches it. Npm2 handles it as dependencies and downloads it to node_modules, whereas NPM3 simply throws a warning to the console and does not interrupt the installation process or download it.

PeerDependencies is actually a product of NPM2. Npm2 is limited to dealing with dependencies in a nested structure, resulting in the same package being installed many times, resulting in a lot of redundant code, so it has to reuse the same package with peerDependencies mechanism. Since NPM3 has flattened dependencies, there is probably less need for this field.

OptionalDependencies and bundledDependencies

OptionalDependencies specifies optionalDependencies that are declared when a package is not needed. If the dependency in optionalDependencies is not found or fails to install, NPM does not throw an error to stop the installation.

BundledDependencies refers to bundle dependencies if you want them to be included in the final package.

NPM Install also provides different dependency management behaviors based on this:

  • NPM install: Downloads dependencies in Dependencies and devDependencies by default
  • NPM install — Production: Download only the dependencies in Dependencies
  • NPM install — Development: Download only the dependencies in devDependencies
  • NPM install — no-optional: You do not need to download the optional dependencies in optionalDependencies
  • NPM install package -save-dev: specifies that the dependency is the development dependency of the project
  • NPM install package -save: indicates that the dependency is a production dependency for the project
  • NPM install package –save-optional: specifies the dependency as an optional dependency for the project

Principles of NPM install

We all know that dependency packages are installed in node_modules when NPM install is executed, so what exactly happens?

Here is a complete flow chart for NPM 5 after install is implemented, which can be broken down into three parts:

1. The pre-installed

Preinstall refers to the preparatory work done before installing dependencies. In combination with verbose logs provided by YARN and NPM, yarn executes yarn-verbose and NPM executes NPM install –timing=true –loglevel=verbose.

  • Execute the project’s own preinstall. The current NPM project will be executed if it defines the preinstall hook.

  • Check config in the following order:

    1. Configuration on the command line, for example, NPM install –registry=http://****
    2. .npmrc file in the home directory (.yarnrc)
    3. The.npmrc file in the user’s home directory

2. Build a dependency tree

Dependency management

The way NPM manages dependencies has undergone several versions of change.

Nested structure

In earlier versions of NPM, the way NPM handled dependencies was very simple, installing them into their respective node_modules in a recursive fashion, strictly following the package.json structure and the package.json structure of the child dependencies until the child dependencies no longer had their own dependencies. The node_modules structure corresponds to package.json in a hierarchical manner, ensuring that the directory structure is the same for each installation.

For example, if react-router, react-router-dom, and history are installed at the same time, the dependency structure is as follows: package.json

As can be seen, there are many same sub-dependencies in the first layer, and even the history of the same level. However, because the dependency structures are placed in different directories and cannot be reused, NPM will still download all the packages, resulting in a lot of redundant code. At the same time, the problem is that the node_modules folder gets huge as more and more dependencies grow, and too much nesting can cause unpredictable problems.

Flat structure

To address these issues, NPM made a major update in version 3.0 to add depute to build the dependency tree, changing the module nesting structure to a flat structure. Therefore, both the first and child dependencies are preferentially installed in node_modules in the home directory.

In the same example, the dependency structure after NPM upgrade is as follows:

When the same dependency is installed, check whether the installed module version conforms to the version range of the new module. If yes, do not install the module again. If no, install the module in the node_modules of the current module.

Therefore, when a module is referenced in a project, its lookup rules are as follows:

  • Searches under the current module path
  • Search in the current node_modules path
  • Search in the parent node_modules path
  • Until the global node_modules is searched
package-lock.json

Due to the nature of NPM versioning, it is possible to lock only large versions when declaring dependencies in package.json, which means that small changes in some dependencies can cause changes in the entire dependency structure, causing unpredictable problems. For this purpose, YARN defines the yarn.lock file to lock the version, and NPM also introduces package-lock.json after NPM5. As long as you have package.json in your directory, you can guarantee that the generated node_modules directory structure is exactly the same.

Json and package-lock.json conflicts are handled slightly differently from version to version:

  • [email protected]: according to the lock file, even if the package.json file is changed, as long as the lock file exists, it will still be installed according to the lock file.
  • [email protected]: Ignore the lock file, as long as the JSON file is changed, then according to json.
  • [email protected]: Both lock and package files are taken into account. If package.json and lock files are different, update the package according to package.json file and update the lock file.

Rely on the tree to build details

We still look at the details of the NPM dependency tree build based on verbose logs.

NPM install initializes an object Installer. Some key fields are as follows:

Args: parameter currentTree: dependency tree formed by package information in the current node_modules idealTree: Dependency tree formed after dependencies are installed. Differences: Difference queues of the two trees toDO: Started: Records the completion time of each step for log outputCopy the code

According to the figure above, the key nodes recorded in the log are:

loadCurrentTree:

Build the current dependency tree and read the package from node_modules, removing the initial letter “. To serialize the read results and store them in currentTree.

loadIdealTree:

There are three steps to build the dependency tree that will eventually be formed:

  • CloneCurrentTreeToIdealTree: the current reliance on copy idealTree a given tree.

  • LoadShrinkwrap: Read NPM-Shrinkwrap. Json, package-lock.json, package.json in sequence, and rebuild a dependency tree without considering installation parameters to form a primitive idealTree.

  • LoadAllDepsIntoIdealTree: complete the core part of the building. Previously, only the root node of the entire dependency tree was serialized. Since a subtree is maintained under each first-level dependency, NPM will continue to look for nodes at deeper levels from each first-level dependency. This involves fetching and de-reworking packages:

    Package for

    1. Check whether the package information in package.json is consistent with the package information in the lock file. If so, fetch the package information directly.
    2. If the lock file does not have package information, but does have a condition, such as integrity, to construct package information, build a package information based on that.
    3. If none of them meet the requirements, obtain the package information of the corresponding version from the repository according to Semver rules.
    4. If the repository cannot find the package, an error is reported and the installation process exits.

    Package to heavy

    After the above steps, we end up with a complete dependency tree, which may have a large number of redundant modules, which are installed strictly according to this structure prior to NPM3 and deDUpe processing added after the NPM upgrade. It iterates through all nodes, puts the module below the root node, flattens the dependency structure, and when it finds a duplicate dependency (with the same module name and semver-compatible), it drops it. Therefore, NPM processes the dependency twice.

generateActionsToTake:

After the previous two steps, two trees, currentTree and IdealTree, are mounted on the Installer object. So this step compares the two trees, flattens them, compares which packages do Add, remove, Update, move, and so on, and then puts them in the Differences field.

decomposeActions:

Iterate over the Differences field, depending on the stored Actions type (add, remove…). Generates a more detailed set of instructions and places them in the Todo field of the Installer object. At this point we’re almost done building the dependency tree.

3. Install dependencies

In this step, we just need to execute the instructions stored in todo from the previous step.

NPM takes the package, decompresses the package to node_modules, executes the lifecycle functions (preinstall, install, postinstall), and finally updates the package.lock file.

So how did NPM get the bag? After NPM5, caching was added. NPM will first check whether there is a corresponding package in the local cache, and if there is, it will directly use it, which can reduce the number of network requests. Otherwise, it will download and cache the package at the warehouse address specified in the package information and wait for the second use. At this point, we can see how NPM finds the cached package:

Run the NPM config get cache command to query the directory where the NPM cache package is stored.

The content-v2 directory is used to store the contents of the tar package, and the index-v5 directory is used to store the index information of the tar package. If you open an index file, you can see that the react package contains some meta information:

In the NPM project file, another file that can store dependent meta information is package-lock.json. React:

Json (sha-512/ SHA1 /sha256) to get the hash value of the dependent package. Take the first 4 bits as the directory index and locate the package index file under index-v5. Extract the location index (also a string of hash values) of the contents of the package and find the tar package under Content-v2.

The NPM cache storage mechanism is different from YARN. Some changes are made based on version upgrades. Many online decrypting methods are invalid.

summary

This article tries to understand NPM from different angles, including the origin of NPM, package mechanism design, dependency management, cache management and so on.

reference

  • www.zhihu.com/question/30…
  • Juejin. Cn/post / 684490…
  • Neveryu. Making. IO / 2017/03/07 /…
  • zhuanlan.zhihu.com/p/91844181