Yarn Install Workflow analysis | seven days of clocking

Analyze the collaboration between lockfile, cache, request Registry, flattened dependency tree, and Install Scripts by analyzing the YARN Install workflow. This helps you understand some common YARN operations and get some inspiration when you solve problems.

If you have any questions about the concepts of package management and lockfile, check out the short article introduction to Front-end Package Management.

An overview of

Before starting the main process, Yarn checks for nPM-shrinkwrap. Json and package-lock.json files in the current project directory. If nPM-shrinkwrap. Json is present, YARN ignores it and prints a prompt. If package-lock.json exists, YARN will advise users not to mix package managers to avoid inconsistent version resolution.

During yarn Install, you can see five steps decorated with emoji ICONS on the terminal, as shown below:

These five steps correspond to the five phases that YARN Install performs

Validating package.json (check package.json) : Check the runtime environment
Considerations Packages: Combine dependency information
Fetching packages: To fetch dependent packages to the cache
Linking dependencies: Replication dependencies to node_modules
Building fresh Packages: Scripts for the install phase

Check package.json: Check the runtime environment

The OS, CPU, and Engines fields in package.json specify the operating conditions required by the module. In this phase, check whether the current environment meets these operating conditions.

Parsing packages: Integrate dependency information

This stage integrates all dependent details through recursive processing. The process is summarized in two parts:

Collect the first layer dependencies: Json dependencies, devDependencies, optionalDependencies, and the top-level packages list in Workspaces as “Package name @ version range” The format of the integration of the first layer of dependency collection, can be represented as a string array;

Iterate over all dependencies and collect dependency information: In a nutshell, start from the first-layer dependency set and combine yarn.lock and Registry to obtain the package version, download address, hash value, and child dependency information. Handle individual dependencies according to yarn.lock preference:

Find the package record that can match completely in yarn.lock, locate the package record, and then determine the version number. Assemble the specific data structure according to the package record and add it to the dependency information Map.
If the package record cannot be located, request package information from the configured Registry through the response dataversionsField to obtain the highest version information consistent with the version range, take the highest version as the determined version, assemble specific data structure according to the package information and add it into the dependency information Map;
If it is found that the required version of the dependency overlaps with the dependency that already exists in the collection, the dependency information Map is added by referring to the same data structure, and the next sub-dependency processing is not carried out. This step is to prepare for overlapping version intervals
After the data processing of the current dependency is completed, the sub-dependencies are processed hierarchically.

To sum up, the policy at this stage is as follows: If yarn.lock matches the version, yarn.lock prevails. Otherwise, request update information from Registry.

Directions 👉 : The main code location for this section

Get packages: Get dependent packages into the cache

Before entering this phase, YARN compares whether the modules under the current project node_modules meet the dependency information set requirements. If yes, the yarn install is complete and success Already up-to-date is displayed.

Simply mimic the set of dependency information from the previous step, using patternX to refer to a data object:

{
  'the trim - newlines @ ^ 1.0.0': pattern1,
  'the trim - newlines @ ^ 2.0.0': pattern2,
  'the trim - newlines @ ^ 3.0.0': pattern3,
  'the typescript @ ~ 4.0.3': pattern4
  'the typescript @ ~ 4.0.5': pattern4
}
Copy the code

You can find a feature that determines that dependencies with the same version number refer to the same object. The feature is combined with Set screen to eliminate repeated dependencies and merge overlapping versions.

The goal at this stage is to collect all required dependency packages into YARN’s cache directory.

Yarn determines whether there is a file system in the cache. If there is no file system in the cache, yarn tries to read the file system and download it from Registry.

The cache

The cache directory of YARN can be queried using YARN cache dir. The cache entries are saved in the directory named {slug}/node_modules/{packageName}. It is worth noting that slug consists of version, hash, and UID, so YARN stores cached entries in a hierarchical tiling format

This section uses typescript packages as an example to analyze the contents of the cache items, including the downloaded dependency compression package (.yarn-tarball.tgz), Metadata file (.yarn-metadata.json), and decompression file of the compressed package. If the bin field is set in package.json, copy the relevant files in the.bin directory.

Local file system

For relative paths and file // protocol paths, YARN only tries to read from the local directory.

Network request

Processing of the URL is done primarily in the package parsing phase, with data from yarn.lock extracting the URL from the Resolved field and data from Registry using dist. Tarball as the URL. Only non-HTTPS protocols are normalized during the network request phase. Having the download path (Resolved field) specified in yarn.lock take precedence does help to install as consistent a dependency environment as possible and avoid the effects of possible unknowns in different Registries on dependent packages.

Yarn also optimizes network requests:

First, the cache is made in the form of urlToPromise to avoid the unnecessary overhead of sending requests to the same URL.
Supports a maximum of 8 requests to be sent in parallel--network-concurrencyCustom upper limit), the implementation idea is to maintain a request queue to the waiting request queue, userunningThe variable records the number of requests being processed as a lock. When a new request is queued or a request is processed, the next request to be queued is processed, thus realizing an automatic loop.
Automatic resending policy (the maximum number of resending requests for a single request is 5). The idea is to maintain an offline queue for storing failed and unresending requests and resending one request every 3s. When the retransmission limit is exceeded, output “Info There appears to be trouble with your network connection. Retrying…” Indicates that the request is abandoned.

Directions 👉 : The core code entry for this section

Connection dependencies: Copy dependencies to node_modules

After the previous step, all the dependencies needed for the project are ready in the YARN cache directory. The main work in this phase is to copy dependencies from the cache to the project’s node_modules. This phase actually does three things:

Processing peerDependencies
Flattening depends on the tree
Copy depends on node_modules

peerDependencies

PeerDependencies are commonly used in packages to be distributed. When one package must be based on another, use peerDependencies to specify the relationship between the two. React relies on react- DOM, and koA relies on KOA.

The processing strategy of YARN is that if peerDependencies is not found in the list of dependency packages that are handled, you are prompted to install peerDependencies manually.

Flattening depends on the tree

The dependency tree construction determines the file structure under node_modules, and the construction strategy also goes through the process of gradual optimization

The NPM V2 version uses deep nested trees to maintain dependency trees based on actual dependencies. Doing so tends to create nested hell, where co-dependencies have copies under multiple modules.
NPM V3 uses the flattening strategy of YARN to process packets according to the package installation sequence. The first v1.0 packet is stored at the top layer, and the next v1.0 dependency is not processed (thanks to the function that the module query system searches up the dependency tree). Subsequent dependencies on other versions of package A are stored in the dependency module’s subdependency tree. The problem with this is that the order in which the packages are installed determines the final structure, and there is no scientific decision to rely on the tree structure based on how often the versions of the dependent packages are used.
Yarn analyzes the usage frequency of different versions of a dependency package in the flattened dependency tree phase. The version with the highest usage is placed at the top level. This process is called dedupe. You need to pass in NPMnpm dedupeDo this manually.

Copy depends on node_modules

The soft connection and cache directories are treated separately in this step. The end effect is to place executable modules into node_modules in a dependency tree structure.

Build installation: Execute the script for the Install phase

The lifecycle hooks associated with install are executed at this stage, including preinstall, Install, and postinstall. Yarn Build and YARN Rebuild are used in this phase.

Copying the dependency policy from the cache enables almost zero build zero install build complete dependency environment. However, some packages need to dynamically generate modules based on the current host environment, and the package manager provides a series of hooks to support this.

Take a familiar example — Node-sass.

The Sass engine was originally implemented in Ruby and required Ruby as the runtime environment support. Later, in order to satisfy cross-language use, the Sass team provided LibSass — adding a C/C++ interface to the engine.

In the pre-WebAssembly era, execution of native code on Node was done with a node-gyp build that built binaries in the binding.node format into code that could be executed by NodeJS.

For Node-sass to work, you need to rely on Node-gyp and download additional binaries from the SASS binary resource site.

There are two scripts associated with install in the package.json of Node-sass

{
  "scripts": {
    "install": "node scripts/install.js"."postinstall": "node scripts/build.js",}}Copy the code

The install phase downloads binaries, and the Postinstall phase is built through Node-gyp.

This build installation process produces a number of pain points:

The binaries were slow to download, which was related to accessing the extranet server, and the changes were madesass_binary_siteThe general operation of configuring variables
node-gypInstall scripts is also used to download the Node source code, creating a variable that needs to be configureddisturl

Custom scripts, which add variables that are outside the control of the package manager, end up causing extra processing for developers in their use.

As Dart and WebAssembly mature, cross-language code opens up new possibilities. Yarn also recommends that developers use WebAssembly instead of Install Scripts. Dart Sass support in pure JavaScript using Sass, Node-sass no longer follows node version updates, Sass will be the future solution.

After 5 steps, update yarn.lock.

conclusion

Interesting emoji ICONS, simple and clear prompts, and YARN expressions behind them. Do you understand now?