When I am asked this question out of the blue, I feel a little overwhelmed; Although I have read some articles about it, I have no memory of them in my little skull.

Today, I will talk about the emergence of NPM and YARN, version changes and optimizations, installation principles and some suggestions in practice, etc.

Early NPM

In the earliest version of NPM (NPM V2), the design of NPM was very simple. When installing dependencies, we put them into node_modules. At the same time, if A direct dependency A depends on another dependency package B, then dependency B will be installed as an indirect dependency in the node_modules folder of dependency A, and then there may be the same dependency recursion between multiple packages. If the project is too large, it will inevitably form A huge dependency tree. Dependency packages are duplicated, creating nested hell.

So how do we understand nested hell?

  • First of all, the dependency level of the project is too deep, which is not conducive to troubleshooting and debugging if there are problems
  • In a dependent branch, the same version of interdependence can occur

So what are the consequences of repeating the question?

  • First of all, the installation results occupy a large amount of space resources, resulting in a waste of resources
  • At the same time, it takes too long to install dependencies because of repeated dependencies
  • Or even, because the directory level is too deep, the file path is too long, will be inwindowsDelete from the systemnode_modulesThe file cannot be deleted

So, how are the later versions optimized step by step? More on that later.

NPM or YARN development a bit of confusion

Do you have situations like this in actual development

  1. When there is a problem with your project dependency, will we delete it directlyNode_modules and lockfilesRely on, and then againnpm installDoes the deletion method really work? Will this use scheme bring any problems?
  2. Install all dependency packages todependenciesIn thedevDependenciesIs there a problem with not differentiating?
  3. A project that you useyarn, I usenpmWill there be a problem?
  4. One more question,Lockfiles fileWhen we submit code, do we need to submit it to the repository?

In fact, I am also in a state of half-understanding of the above questions.

So, the world is so big, I want to see what is the relationship between these two brothers?

NPM installation mechanism and core principles

We can start by looking at the core goals of NPM

Bring the best of open source to you, your team and your company.
Copy the code

This means bringing the best open source libraries and dependencies to you, your team, and your company. From this statement, we can learn that the most important aspect of NPM is installing and maintaining dependencies. So, let’s take a look at the installation mechanism of NPM.

NPM installation mechanism

Here is a flow chart to learn the installation mechanism of NPM Install

After NPM install is executed, the NPM configuration is first checked and retrieved, where the priority is:

Project level.npmrc files > user level.npmrc files > global level.npmrc > NPM built-in.npmrc files

Then check if there is a package-lock.json file in the project

  • If so, check whether package-lock.json and package.json declare the same dependencies:

    • Consistent, direct usepackage-lock.jsonTo load a dependency from the network or cache
    • If not, handle it according to the different versions in the process
  • If not, the dependency tree is recursively built based on package.json, and the complete dependency resource is downloaded based on the built dependency. At download time, the relevant resource cache is checked:

    • Exists, unzip directly tonode_modulesIn the file
    • Does not exist, download package from NPM remote repository, verify package integrity, add package to cache at the same time, unzip tonode_modulesIn the

Finally, the package-lock.json file is generated

In fact, in our actual project development, we use NPM as a team best practice: the same project team should maintain consistency of NPM versions.

From the above installation process, I don’t know if you have noticed one point. In actual project development, if you install corresponding dependencies every time, if the relevant dependency package is too large or depends on the network, it will undoubtedly increase the time cost of installation. So, caching here is a good way to solve this problem, and we’ll talk about that later.

The emergence of the yarn

Yarn is a new JavaScript package manager built by Facebook, Google, Exponent and Tilde. It was created to address some of the historical shortcomings of NPM (such as NPM’s guarantee of integrity and consistency of dependencies, and the slow installation of NPM)

When NPM was v3, a package management tool called YARN came along. In 2016, NPM did not have package-lock.json file, so the installation speed was very slow and the stability was very poor. Yarn has solved some of the following problems:

  • Deterministic: Through mechanisms such as yarn.lock, the same dependencies can be installed in the same way in any environment or container, even if the installation sequence is different. (So, before NPM V5, there is no package-lock.json mechanism and nPM-shrinkwrap. Json is not used by default.)

  • Flat module installation mode: The dependency packages of different versions are grouped into a single version according to certain policies. To avoid creating multiple versions of the project causing redundancy (the current version of NPM has the same optimization)

  • Better network performance: YARN uses the concept of request queuing, similar to concurrent pool connection, to better utilize network resources. A retry mechanism for installation failures has also been introduced

  • Cache mechanism is adopted to realize offline mode (current NPM also has a similar implementation)

We can look at the yarn.lock structure:

"@ Babel/cli @ ^ 7.1.6", "@ Babel/cli @ ^ 7.5.5" : The version "7.8.4" resolved "http://npm.in.zhihu.com/@babel%2fcli/-/cli-7.8.4.tgz#505fb053721a98777b2b175323ea4f090b7d3c1c"  integrity sha1-UF+wU3IamHd7KxdTI+pPCQt9PBw= dependencies: Commander "^4.0.1" convert-source-map "^1.1.0" fs-readdir-recursive "^1.1.0" glob "^7.0.0" lodash "^4.17.13" make-dir "^2.1.0" slash "^2.0.0" source-map "^0.5.0" optionalDependencies: chokidar "^2.1.8"Copy the code

For those familiar with NPM’s package-lock.json file, you may see some differences at first glance; Package-lock. json uses the JSON structure, but YARN does not use this structure, but a custom marking mode. We can see that the new customizations are also highly readable.

Another significant difference between Yarn and NPM is that yarn.lock does not depend on a fixed version. This illustrates a problem: a single yarn.lock problem does not determine the file structure of ✅node-modules, but requires package.json.

In fact, I have a question, how to implement NPM to YARN switch?

Here I learned that there is a specialized tool, synp, that converts yarn.lock to package-lock.json and vice versa.

Yarn uses perfer-Online mode by default, that is, network resources are preferentially used. If the network resource request fails, request cached data again.

At this point we should have a basic understanding of YARN, so let’s move on to its installation mechanism

Yarn installation mechanism

In the previous section, we had a basic understanding of the installation mechanism of NPM. Now let’s take a brief look at the installation concept of Yarn.

In brief, Yarn installation consists of five steps:

Checking – Resolving Packages -> Fetching package -> Linking Packages -> Building Packages

So let’s start to analyze what’s going on in these processes:

Testing kits

The main purpose of this step is to check whether there are nPM-related files in our project, such as package-lock.json, etc. If so, there is a warning to the user that the files may be in conflict. In this step, the system OS and CPU information are also checked

Analytic package

This step parses the information for each package in the dependency tree

First, we get the dependencies at the first level: dependencies, devDependencies, and optionalDependencies defined in package.json in our current project.

Then, the dependency information of packages will be obtained by traversing the first-layer dependencies, and the version information of nested dependencies under each dependency will be recursively searched, and the parsed packages and packages being parsed will be stored in the Set data structure, so as to ensure that packages within the same version range will not be parsed repeatedly:

For example
  • For unparsed package A, the first attempt is made fromyarn.lockTo get version information and mark it as resolved
  • If theyarn.lockIf package A is not found inRegistryMake a request to obtain information about the highest known package that meets the version range, and mark the package as resolved once obtained.

Anyway, after parsing the package, we have determined the specific version information of the parsing package and the download address of the package.

Get package

In this step, we first check the cache to see if there are currently dependent packages in the cache, and then download the packages that do not exist in the cache to the cache directory. But here’s a little bit for you to think about:

For example: how do I know if there are current dependencies in the cache?

In fact, in Yarn, the cacheFolder+ is usedslug+node_modules+pkg.name Generates a path; Check whether the path exists in the system. If the path exists, the cache exists, and you do not need to download it again. The path is the specific path of the dependent package cache.

What about cache packets that don’t hit? There is a Fetch queue in Yarn for network requests based on specific rules. If the downloaded package is a file protocol or a relative path, it points to a Local directory, and Fetch From Local is called to Fetch the package From the offline cache. Otherwise Fetch From External is called and the final result is written to the cache directory using fs.createWritestream.

Links to package

We put the dependency in the cache directory last step, so what should we do next? Should you copy dependencies in your project to node_modules? Yes; It’s just a flat rule to follow. Before copying dependencies, Yarn parses peerDepdencies. If a package that meets the requirements of peerDepdencies cannot be found, a warning message is displayed and dependencies are copied to the project.

Build a package

If there are binary packages in the dependency package that need to be compiled, it will be done in this step.

Caching mechanism for NPM

Write not move, rest meeting, not over to continue….

References:

The yarn’s official website

npm-about

Front-end Infrastructure and Architecture 30 lectures

NPM Install principle analysis

Node.js package manager history

Dependencies devDependencies peerDependencies

Learn about peerDependencies