NodeJS currently has two systems: CommonJS(CJS) and ECMAScript Modules (ESM). There are three main topics in this paper:

  1. CommonJS internals
  2. ESM module system of NodeJS platform
  3. CommonJS vs. ESM; How do I switch between two systems

First of all, why do we have a modular system

Why a modular system

A good language must have a modular system, because it can solve the basic needs we encounter in engineering

  • Breaking up functionality into modules makes the code more organized, easier to understand, and allows us to develop and test the functionality of each sub-module individually
  • Functions can be encapsulated, and then other modules can be directly introduced to use, improving reusability
  • Implementation encapsulation: only need to provide simple input and output documents externally, internal implementation can be shielded externally, reducing the cost of understanding
  • Managing dependencies: A good module system makes it easy for developers to build other modules based on existing third-party modules. In addition, the module system allows users to simply introduce the modules they want, and introduce the modules on the dependency chain

When we started, JavaScript didn’t have a good module system, and pages were mostly about importing different resources through multiple script tags. However, with the gradual complexity of the system, the traditional script tag mode could not meet the business needs, so we began to plan to define a set of module system, including AMD,UMD and so on. NodeJS is a server-side language running in the background. Compared with the HTML of the browser, it lacks script tag to introduce files. Js files that are completely dependent on the local file system. So NodeJS implements a set of module system according to CommonJS specification and ES2015 specification was released in 2015. At this time, JS has a formal standard for module system. The module system built according to this standard is called ESM system

CommonJS module

There are two basic ideas in CommonJS planning:

  • The requeire function allows you to import a module from the local file system
  • Exports and module.exports are two special variables to publish

Module loader

To implement a simple module loader, we start with a function that loads the contents of a module. We put this function in a private scope to avoid polluting the global environment. Then we run eval

function loadModule(filname, module.require) {
  const wrappedSrc = `
    (function (module, exports, require) {
      ${fs.readFileSync(filename, 'utf-8')}
    })(module, module.exports, require)
  `
  eval(wrappedSrc)
}
Copy the code

In the code we read the contents of the module through the synchronous method readFileSync. In general, the synchronized version should not be used when calling the file system API, but this is the case here. Commonjs synchronizes operations to ensure that multiple modules can be installed in the proper order of dependencies introduced now in the require function

function require(moduleName) {
  const id = require.resolve(moduleName);
  if (require.cache[id]) {
    return require.cache[id].exports
  }

  // Module metadata

  const module = {
    exports: {},
    id,
  }

  require.cache[id] = module;

  loadModule(id, module.require);

  // Returns the exported variable
  return module.exports
}

require.cache = {};
require.resolve = (moduleName) = > {
  // Resolve the complete module ID according to ModuleName
}
Copy the code

The above implementation of a simple require function, this home-made module system has a few quirks to explain

  1. After entering the ModuleName for the module, you should first parse out the full path of the module (how to parse it will be described later), and then store this result in the variable id
  2. If the module has already been loaded, the result in the cache is returned immediately
  3. If the template has not been loaded, configure an environment. Specifically, create onemoduleThe exports variable allows it to contain an exports property. The contents of this object will be populated with the code that the module uses when exporting the API
  4. Cache the Module object
  5. performloadModuleFunction to mount the contents of another module by passing in the newly created Module object
  6. Returns the exported contents of another module

Module resolution algorithm

As mentioned above, we pass in the module name, the module resolution function can return the corresponding full path of the module, then load the corresponding module code through the path, and use this path to identify the identity of the module. The resolve function used by the resolve function handles three cases

  • Is the file module to load?If moduleName begins with a /, it is considered an absolute path, and you simply install the path to return as is when loading. If moduleName. /Start, then, as a relative path from the directory from which the module was requested to be loaded
  • Is it the core module that you want to loadifmoduleNameNot in/or. /At the beginning, then the algorithm will first try toNodeJSCore modules to look for
  • Is the package module to loadIf you don’t find itmoduleNameMatching core modules, then start with the module that made the load request and search up the level for namesnode_modulesThe stranger, look inside there can andmoduleNameA matching module, if any, is loaded. If not, continue online along the catalog and in the correspondingnode_modulesTo the root directory of the file system

In this way, two modules can rely on different versions of packages, but still load normally, such as the following directory structure:

myApp
    - index.js
    - node_modules
        - depA
            - index.js
        - depB
            - index.js
            - node_modules
                - depA
        - depC
            - index.js
            - node_modules
                - depA
Copy the code

In the example above, myApp, depB, and depC all rely on depA but load different modules. Such as:

  • in/myApp/index.js, the source of the load is/myApp/node_modules/depA
  • in/myApp/node_modules/depB/index.js, is loaded with/myApp/node_modules/depB/node_modules/depA
  • in/myApp/node_modules/depC/index.js, is loaded with/myApp/node_modules/depC/node_modules/depA

NodeJs manages dependencies well because it has a core module resolution algorithm behind it that manages thousands of packages without conflicts or version incompatibilities

Circular dependencies

Many people think of circular dependencies as a theoretical design problem, but this kind of problem is likely to occur in real projects, so you should know how CommonJS handles it. Look at the previous implementation of the require function and you’ll realize the risks involved. Here’s an exampleJs module needs to rely on both a.js and B.js modules. At the same time, A.js needs to rely on B.js, but B.js in turn depends on A.js, which causes a loop dependency. Here is the source code:

// a.js
exports.loaded = false;
const b = require('./b');
module.exports = {
  b,
  loaded: true
}
// b.js
exports.loaded = false;
const a = require('./a')
module.exports = {
  a,
  loaded: false
}
// main.js
const a = require('./a');
const b = require('./b');
console.log('A ->'.JSON.stringify(a))
console.log('B ->'.JSON.stringify(b))
Copy the code

Running main.js yields the following results

As you can see from the results, CommonJS is at risk of cyclic dependencies. When module B imported module A, the content was not complete. Specifically, it only reflected the request of module B. Jsa.jsModule when the module is in a state that cannot be reflecteda.jsA state in which a module is finally loaded

An example diagram illustrates this process belowThe following is a detailed process explanation

  1. The whole process starts with main.js, which initially imports the A.js module
  2. The first thing that a.js does is export a value called loaded and set it to false
  3. The A. js module requires to import the B. js module
  4. Similar to a.js, b.js first exports the variable loaded is false
  5. B. Js To continue, import A. js
  6. Since the system has already started processing the A. js module, B. js will immediately copy the exported content of A. js into this module
  7. B. Js will export its loaded value to false
  8. Since B has completed execution, control will return to A. Js, who will make a copy of the state of the B. js module
  9. A. js execute the command again and change the value of loaded to true
  10. Finally, execute main.js

As can be seen above, due to synchronous execution, the A. js module imported by B. js is not complete and cannot reflect the final due state of B. jS. As you can see in the example above, the result of circular dependencies is even more serious for large projects.

The usage method is relatively simple, the space is limited will not be explained in this article

ESM

The ESM is part of the ECMAScript 2015 specification, which provides a unified system of modules for Javascript to adapt to various execution environments. An important difference between ESM and CommonJS is that ES modules are static, meaning that statements importing modules must be written at the top level. In addition, referenced modules can only use constant strings and cannot rely on expressions that need to be evaluated dynamically at run time. For example, we cannot introduce the ES module in the following way

if (condition) {
  import module1 from 'module1'
} else {
  import module2 from 'module2'
}
Copy the code

CommonJS can import different modules based on conditions

let module = null
if (condition) {
  module = require("module1")}else {
  module = require("module2")}Copy the code

It seems a little bit stricter than CommonJS, but because of this static import mechanism, we can statically analyze dependencies and remove logic that won’t execute. This is called tree-shaking

Module loading process

To understand how the ESM system works, and how it handles loop dependencies, we need to understand how the system parses and executes Javascript code

Load the phases of the module

The goal of the interpreter is to build a graph, also called a dependency graph, that describes the dependencies between the modules to be loaded. It is through this dependency diagram that the interpreter determines module dependencies and decides in what order it should execute code. For example, if we need to execute a JS file, the interpreter will start at the entry and look for all import statements. If an import statement is encountered during the search, it will recurse depth-first until all code has been parsed. This process can be broken down into three processes:

  1. Parse: Find all lead-in statements and recursively load the contents of each module from related files
  2. Instantiation: Keeps a named import in memory for an exported entity, but does not assign a value to it. In this case, the dependency relationship is also established according to the import and export keywords, and js code is not executed
  3. Execution: At this stage, NodeJS starts executing the code that will enable the actual exported entity to obtain the actual value

In CommonJS, dependencies are resolved while files are executed. So when you see require, it means that the previous code has been executed. Because require operations don’t have to be at the beginning of the file, they can be at the task. But ESM systems, unlike ESM systems, have these three phases separate, and they must have the dependency diagram completely constructed before they can start executing the code

Circular dependencies

In the example of CommonJS loop dependencies mentioned earlier, use the ESM approach to adapt

// a.js
import * as bModule from './b.js';
export let loaded = false;
export const b = bModule;
loaded = true;
// b.js
import * as aModule from './b.js';
export let loaded = false;
export const a = aModule;
loaded = true;
// main.js
import * as a from './a.js';
import * as b from './b.js';
console.log("A =>", a)
console.log("B =>", b)
Copy the code

It should be noted that this cannot be usedJSON.strinfyMethod because circular dependencies are used hereIt can be seen from the above execution results that both A. js and B. js can completely observe each other. Different from CommonJS, the status obtained by modules is incomplete.

Analyze the

Here’s how it works:

As shown above:

  1. Parsing from main.js, we first find an import statement and then enter a.js
  2. Execute from a.js, find another import statement, execute b.js
  3. We found an import statement and introduced a.js, since a.js has been relied on before, we will not execute this path again
  4. B. Js executes further and finds no other import statements. After going back to A.js, I also find no other import statements, and go straight back to the main.js entry file. Further down, it is found that b.js is required to be introduced, but the module has been accessed before, so this path will not be executed

In depth-first fashion, the module dependency diagram is formed into a tree through which the interpreter executes code. At this stage, the interpreter starts from the entry point and analyzes the dependencies between modules. At this stage, the interpreter only cares about the system’s import statements, loads in the modules they want to introduce, and explores the dependency graph in a depth-first manner. Walking through the dependencies in this way results in a tree-like structure

instantiation

At this stage, the interpreter starts at the bottom of the tree and works its way up to the top. Before it reaches a module, it looks for all the attributes that the module wants to export and builds an implicit table in memory to hold the names of the attributes that the module wants to export and the values that the attribute will have, as shown below:

As you can see from the figure above, in what order are modules instantiated

  1. The interpreter starts with the b.js module, which it finds exports loaded and A
  2. The interpreter then analyzes the a.js module and finds that this module exports loaded and B
  3. Finally, analyzing the main.js module, he found that it does not export any functionality
  4. The set of exports implicit diagrams constructed during the instantiation phase only records the relationship between the exported name and the value that the name will own, and the value itself is not initialized during this phase.

After the above process is complete, the parser needs to execute again, this time associating the exported names of each module with the modules that introduced them, as shown below:

The steps this time are:

  1. Module B. js is connected to the exported content of module B. js. This link is called aModule
  2. Module A. js is connected to the exported content of module A. js. This link is called bModule
  3. Finally, the module main.js is connected to the exported content of the module B. js
  4. At this stage, all the values are not initialized, we just set up the corresponding links to the corresponding values, and the values themselves will have to wait until the next stage

perform

At this stage, the system finally executes the code in each file. He accesses the original dependency graph from the bottom up in depth-first order, and executes the accessed files one by one. In this case, main.js is executed last. The result of this execution ensures that all values exported by each module are initialized when the program runs the main logic

The steps in the above figure are as follows:

  1. Start from B. js. The first line of code to execute initializes loaded from this module to false
  2. The next step copies the aModule to A, where A gets a reference value, which is the A. js module
  3. Then set loaded to true. At this point, all values of module B are determined
  4. Now execute a.js. Initialize the export value loaded to false
  5. The b property that the module then exports is worth the initial value, which is a reference to the bModule
  6. Change the value of loaded to true. Here, we put the corresponding values of these attributes exported by the A. JS module system, and finally determined down

After these steps, the system is ready to execute the main.js file. At this point, all the exported properties of each module have been evaluated. Because the system imports modules by reference rather than copy, each module can see the final state of the other module completely, even if there are cyclic dependencies between modules

CommonJS and ESM differences and interactive use

Here are some important differences between CommonJS and ESM, and how to use both modules together when necessary

Some references provided by CommonJS are not supported by ESM

CommonJS provides some key references that are not supported by ESM, including require, exports, module.exports, __filename, __diranme. If you use these in an ES module, you will run into problems with reference errors. In ESM systems, we can get a reference to the URL of the current file through the special object import.meta. To get the file path for the current module, write import.meta.url, similar to file: ///path/to/current_module.js. From this path, we can construct the two absolute paths represented by __filename and __dirname:

import { fileURLToPath } from 'url';
import { dirname } from 'path';
const __dirname = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
Copy the code

The CommonJS require function can also be implemented in ESM modules by using the following method:

import { createRequire } from 'module';
const require = createRequire(import.meta.url)
Copy the code

You can now use the require() function to load Commonjs modules in an ES module system

Use another module in one module system

As mentioned above, the module.createRequire function is used to load commonJS modules in ESM modules. In addition to this approach, you can actually import CommonJS modules through the import language. However, this method only exports the content that is exported by default;

import pkg from 'commonJS-module'
import { method1 } from 'commonJS-module' / / complains
Copy the code

In addition, ESM does not support importing JSON files as modules. In commonJS, you can easily implement the following import statement, which will result in an error

import json from 'data.json'
Copy the code

If you need to import json files, you also need to use the createRequire function:

import { createRequire } from 'module';
const require = createRequire(import.meta.url);
const data = require("./data.json");
console.log(data)
Copy the code

conclusion

This article focuses on how the two module systems in NodeJS work. Understanding these reasons can help us write bugs that avoid some of the most difficult problems