The nodeJS version of this article is v13.1.0. The nodeJS version of this article is v13.1.0

Read the precursors before reading this article:

  • [真 钘 δΎ‹ ε₯]V8 advanced learning
  • Nodejs in-depth learning series v8 Basics
  • How to properly embed v8 in our C++ application
  • Nodejs – Libuv Basics
  • Nodejs: Libuv Basics

By the end of this article you will know:

  • Nodejs starts the process
  • Nodejs module classification and their loading process and principle
  • How js code in nodejs calls C++ functions
  • Extra interview questions ~~

1. What does Nodejs depend on?

First of all, NodeJS offers so many modules and runs on so many platforms, not because JS is awesome, but because the underlying layer relies on technologies you don’t know about. The two biggest dependencies are V8 and Libuv. Why do you say that? Because one helps you convert JS code into machine code that runs on various platforms and machines, and the other helps you call various system features on platforms and machines, including manipulating files, listening on sockets, and so on. Putting aside the two biggest dependencies, what is the deps directory in the NodeJS source code?

See Dependencies for Nodejs. See Dependencies for Nodejs

  1. Http_parser: As the name suggests, http_parser is an HTTP parser, which is a lightweight parser written in C. Because the parser is designed not to make any system calls or allocations, the memory footprint per request is very small.
  2. c-aresNodejs uses the C library for some asynchronous DNS resolution. What is exposed at the JS level is in the DNS moduleresolve()Family functions.
  3. OpenSSL: OpenSSL is widely used in TLS and password modules. It provides a well-tested implementation of many of the encryption features that the modern Web relies on for security.
  4. Zlib: For fast compression and decompression, Node.js relies on the industry-standard Zlib library, also known for its use in gzip and libpng. Nodejs uses Zlib to create synchronous, asynchronous, or streaming compression and decompression interfaces.
  5. NPM: I won’t go into that

Here are a few others not mentioned on the official website:

  1. Acorn: A small but efficient javascript parser
  2. Acorn-plugins: Plugins used by Acorn. From the name, Nodejs supports bigInt, private classes and methods, and so on
  3. Brotli: Provides the IMPLEMENTATION of brotli compression algorithm in C language.
  4. Histogram: The C version implements high-dynamic range bar graphs. Why does NodeJS need to reference this?
  5. Icu: ICU(International Components for Unicode) is a set of mature and widely used C/C++ and Java libraries that provide Unicode and Globalization support for software applications
  6. LLHTTP: a more high-performance and maintainable HTTP parser.
  7. Nghttp2: HTTP/2 protocol C language implementation, the head compression algorithm using HPACK
  8. node-inspect: This library is available under the new V8 versionnode debugCommand.
  9. Uv: The essence of Nodejs. It provides Nodejs with the ability to access various operating system features, including file systems, sockets, etc
  10. V8: Compiles Js code to the underlying machine code, which will not be covered here

2. With UV and V8, what does NodeJS do with itself?

Since we’re targeting Javascript developers, we can’t just come up and write C++/C code, so we need something that encapsulates that C++/C code and provides an elegant interface to developers, and that’s what Nodejs does. In a word:

Nodejs encapsulates all the information communicated to the underlying layer, providing developers with a consistent interface definition. Nodejs is trying to achieve interface consistency while continuing to upgrade V8 and Libuv.Copy the code

So how does NodeJS encapsulate libuv and V8 and provide interfaces? Before we get to the bottom of all this, let’s take a look at the Nodejs directory structure, which will be useful later in this tutorial:

The nodeJS source code has two important directories:

  1. Lib: contains javascript implementations of all nodeJS functions and modules that can be referenced directly in your JS project

  2. SRC: contains the C++ version implementation of all the functions so that the code here actually references Libuv and V8.

Then we look at a file in the lib directory and see that, in addition to the normal JS syntax, there is a method that is not seen in ordinary applications: internalBinding. What is this? What does it do?

This is how we start our journey of discovery, taking you step by step into nodeJS and taking you step by step to uncover the secrets of NodeJS. We’ll start with nodeJS compilation.

Before we move on to the compilation process, we need to introduce the concepts of module classification within the Nodejs source code and C++ load binders.

2.1 Nodejs module classification

Nodejs modules can be divided into three categories:

  • Native modules are included in node.js source code and compiled into node.js executable binary JavaScript modules, which are js files in lib and deps directories, such as common oneshttp.fs, etc.
  • Built-in modules: We don’t call them directly, we call them in native modules, and then require them
  • Third-party modules: Modules that are not shipped with Node.js can be referred to as third-party modules, such as Express, Webpack, etc.
    • JavaScript modules, this is the most common, we write JavaScript modules when we’re developing
    • JSON module, this is very simple, is a JSON file
    • C/C++ extension module, written in C/C++, compiled with the suffix.node

For example, fs.js in lib is a native module, while node_fs.cc in SRC is a built-in module. Knowing the classification of modules, wonder how they are loaded in? (This article is not about module loading, so third-party modules are not covered.)

2.2. C++ loads binder classification

The following text will cover these concepts:

  • Process.binding (): previously, the C++ binding loader, because it was an object mounted on the global process object, could be accessed from user space. These C++ bindings use this macro:NODE_BUILTIN_MODULE_CONTEXT_AWARE()And their NM_FLAGS are set toNM_F_BUILTIN
  • Process._linkedbinding () : used by developers who want to add additional C++ bindings to their applicationsNODE_MODULE_CONTEXT_AWARE_CPP()Macro, whose flag is set toNM_F_LINKED
  • InternalBinding: a private internal C++ binding loader that is not accessible in user space because it is only available under nativemodule.require (). useNODE_MODULE_CONTEXT_AWARE_INTERNAL()Macro, whose flag is set toNM_F_INTERNAL

Nodejs compilation process

According to the official website recommendation, source code compilation is simple and crude:

$ ./configure
$ make -j4
Copy the code

We can extract some important information from the NodeJS compilation configuration file.

As we all know, Nodejs uses GYP compilation, and the GYP compilation file is node.gyp, and we get two important pieces of information from this file in two places.

3.1, node. Gyp

3.1.1 Entry files for executable applications

As you can see from the target field in this file, multiple targets are generated after compilation, but the most important is the first target, which is configured:

{// defined'node_core_target_name%'is'node'.'target_name': '<(node_core_target_name)'.'type': 'executable'// The type here is executable'defines': [
    'NODE_WANT_INTERNALS=1',].'includes': [
    'node.gypi'].'include_dirs': [
    'src'.'deps/v8/include'].'sources': [
    'src/node_main.cc'],... . }Copy the code

The entry file for the entire Node application is node_main.cc.

3.1.2 Compilation of all JS files in Nodejs source code

The second target of the compiled file is libnode, which compiles the rest of the C++ file into a library file, but has an action before it is compiled:

{// defined here'node_lib_target_name'Is libnode'target_name': '<(node_lib_target_name)'.'type': '<(node_intermediate_lib_type)'.'includes': [
    'node.gypi',].'include_dirs': [
    'src'.'<(SHARED_INTERMEDIATE_DIR)' # for node_natives.h],... .'actions': [{'action_name': 'node_js2c'.'process_outputs_as_sources': 1,
      'inputs': [
        # Put the code first so it's a dependency and can be used for invocation.
        'tools/js2c.py'.'<@(library_files)'.'config.gypi'.'tools/js2c_macros/check_macros.py'].'outputs': [
        '<(SHARED_INTERMEDIATE_DIR)/node_javascript.cc',].'conditions': [['node_use_dtrace=="false" and node_use_etw=="false"', {
          'inputs': [ 'tools/js2c_macros/notrace_macros.py']}], ['node_debug_lib=="false"', {
          'inputs': [ 'tools/js2c_macros/nodcheck_macros.py']}], ['node_debug_lib=="true"', {
          'inputs': [ 'tools/js2c_macros/dcheck_macros.py']}]],'action': [
        'python'.'<@(_inputs)'.'--target'.'<@(_outputs)',]}],Copy the code

Py converts all js files in lib/**/*. Js and deps/**/*. Js into an array of their ASCII code and puts them in node_javascripts.

The generated node_javascript.cc file looks like this:

namespace node {

namespace native_module {
  ...

  static const uint8_t fs_raw[] = {...}

  ...

  void NativeModuleLoader::LoadJavaScriptSource() {... source_.emplace("fs", UnionBytes{fs_raw, 50659}); . } UnionBytes NativeModuleLoader::GetConfig() {
    return UnionBytes(config_raw, 3017);  // config.gypi
  }
}
Copy the code

In this way, all JS files are directly cached in memory, avoiding redundant I/O operations and improving efficiency.

Therefore, from the above configuration information, we can summarize the compilation process as follows:

Now that we know what the compilation process is, let’s analyze what internalBinding is from the nodeJS startup process.

Nodejs startup process

From the previous section, we learned that the entry file for the NodeJS application is node_main.cc, so we traced the code from this file to the following flow chart:

The ones that are highlighted in red are the ones that you need to focus on, and there are some things in there that you can relate to earlier articles if you read themTwo months, the most complete original NodeJS in-depth series on the web.Listed in some basic articles, see here, I believe that there is a sense of enlightenment, feeling knowledge suddenly can be connected, this is the charm of systematic learning ~

Back to the above, all the clues are focused on the function: NativeModuleLoader: : LookupAndCompile. Before calling this function, it is important to note that NativeModuleLoader is instantiated, so its constructor is executed, and its constructor executes only one function: LoadJavaScriptSource(), which is the function we saw in the node_javascrip.cc file in the previous section, so we have the following conclusion:

  • internal/bootstrap/loader.jsIs the first JS file we execute

So NativeModuleLoader: : LookupAndCompile all did some what?

4.1,NativeModuleLoader::LookupAndCompile

It use the incoming file our id (this pass is internal/the bootstrap loader. Js) to look for in the _source variables and after we will cover the entire file content as a new function, Append some function definitions (this time passing getLinkedBinding and getInternalBinding) so that you can call the C++ functions in your js file and then execute the new function. The parameter passed is the Environment in the above: : BootstrapInternalLoaders function:

MaybeLocal<Value> Environment::BootstrapInternalLoaders() {
  EscapableHandleScope scope(isolate_);

  // Create binding loaders
  std::vector<Local<String>> loaders_params = {
      process_string(),
      FIXED_ONE_BYTE_STRING(isolate_, "getLinkedBinding"),
      FIXED_ONE_BYTE_STRING(isolate_, "getInternalBinding"), primordials_string()}; // GetInternalBinding is the function we call 'GetInternalBinding'. If you don't know why js can call C++ functions, please refer to this article: STD ::vector<Local<Value>> loaders_args = {process_object(), NewFunctionTemplate(binding::GetLinkedBinding) ->GetFunction(context()) .ToLocalChecked(), NewFunctionTemplate(binding::GetInternalBinding) ->GetFunction(context()) .ToLocalChecked(), primordials()}; . }Copy the code

After loading loader.js, what does the file do?

4.2,internal/bootstrap/loader.js

This file is unique in that it is the only JS file that does not have the require keyword, and the only external functions it uses are the aforementioned getLinkedBinding and getInternalBinding, which can be verified by the source of the file

This file builds an object called NativeModule with some prototype methods that return a data structure like this:

const loaderExports = {
  internalBinding,
  NativeModule,
  require: nativeModuleRequire
};
Copy the code

Inside we found itinternalBindingThe original implementation of this method:

let internalBinding;
{
  const bindingObj = Object.create(null);
  // eslint-disable-next-line no-global-assign
  internalBinding = function internalBinding(module) {
    let mod = bindingObj[module];
    if(typeof mod ! = ='object') {// here we call our C++ method mod = bindingObj[module] = getInternalBinding(module); moduleLoadList.push(`Internal Binding${module}`);
    }
    return mod;
  };
}
Copy the code

Then we are able to see above the flow chart of a red line, loader, js execution after the return values passed on to the internal/bootstrap/node. Js, this file is used.

The code is as follows:

MaybeLocal<Value> Environment::BootstrapInternalLoaders() {... . Loader_exports Local<Value> loader_exports; loader_exports Local<Value> loader_exports;if(! ExecuteBootstrapper( this,"internal/bootstrap/loaders", &loaders_params, &loaders_args)
           .ToLocal(&loader_exports)) {
    returnMaybeLocal<Value>(); } CHECK(loader_exports->IsObject()); Local<Object> loader_exports_obj = loader_exports.As<Object>(); // Internal_binding_loader is loader_exports.internalbinding. Internal_binding_loader = loader_exports_obj->Get(context(), internal_binding_string()) .ToLocalChecked(); CHECK(internal_binding_loader->IsFunction()); set_internal_binding_loader(internal_binding_loader.As<Function>()); // Notice that require is the require of native_module Reuqire Local<Value> require = loader_exports_obj->Get(context), require_string()).tolocalchecked (); CHECK(require->IsFunction()); set_native_module_require(require.As<Function>()); . } MaybeLocal<Value> Environment::BootstrapNode() {... . std::vector<Local<Value>> node_args = { process_object(), native_module_require(), internal_binding_loader(), InternalBinding Boolean::New(isolate_, is_main_thread()), Boolean::New(isolate_, owNS_process_state ()), primordials()}; . . }Copy the code

This file also injects isMainThread, ownsProcessState, and six C++ functions, process, require, primordials, and internalBinding, for the js file to call.

This leads to another conclusion:

  • Call internalBinding => C++ internal_binding_loader => js internalBinding => C++ GetInternalBinding

But at this point, we still have some questions that need to be further explored.

4.3,GetInternalBinding

In the internal/bootstrap/node. Js, most of them are to process and global object initialization assignment, in accordance with the above to the conclusion, when we call internalBinding, What is actually executed is the GetInternalBinding C++ function. So let’s look at the implementation of this function.

The rules for calling C++ functions with js have been covered in the article on how to properly use v8 to embed them in our C++ applications, so we won’t go over how to do this. Let’s focus on:

void GetInternalBinding(const FunctionCallbackInfo<Value>& args) { ... . // Find the module, where to find? node_module* mod = FindModule(modlist_internal, *module_v, NM_F_INTERNAL);if(mod ! = nullptr) { exports = InitModule(env, mod, module); // What is the constants module? }else if(! strcmp(*module_v,"constants")) {
    exports = Object::New(env->isolate());
    CHECK(
        exports->SetPrototype(env->context(), Null(env->isolate())).FromJust());
    DefineConstants(env->isolate(), exports);
  } else if(! strcmp(*module_v,"natives")) {
    exports = native_module::NativeModuleEnv::GetSourceObject(env->context());
    // Legacy feature: process.binding('natives').config contains stringified
    // config.gypi
    CHECK(exports
              ->Set(env->context(),
                    env->config_string(),
                    native_module::NativeModuleEnv::GetConfigString(
                        env->isolate()))
              .FromJust());
  } else {
    returnThrowIfNoSuchModule(env, *module_v); } // exports is exported ~ args.GetReturnValue().set (exports); }Copy the code

This function leaves us with some questions:

  • Where does modlist_Internal in FindModule come from?
  • Why do native modules have namesconstantsandnatives?

To uncover these questions, let’s dig deeper.

4.4,NODE_MODULE_CONTEXT_AWARE_INTERNAL

This is where NODE_MODULE_CONTEXT_AWARE_INTERNAL comes in, and any careful child will notice that files such as SRC /node_fs.cc end with this macro definition.

Its definition can be found in the node_binding.h file:

#define NODE_MODULE_CONTEXT_AWARE_INTERNAL(modname, regfunc) \
  NODE_MODULE_CONTEXT_AWARE_CPP(modname, regfunc, nullptr, NM_F_INTERNAL
Copy the code

You can see that the macro definition NODE_MODULE_CONTEXT_AWARE_CPP is actually called, only with flag set to NM_F_INTERNAL.

The NODE_MODULE_CONTEXT_AWARE_CPP macro definition actually calls the method: node_module_register.

Node_module_register is used to mount modules to the global static modlist_internal and modlist_linked lists:

if (mp->nm_flags & NM_F_INTERNAL) {
    mp->nm_link = modlist_internal;
    modlist_internal = mp;
} else if(! node_is_initialized) { //"Linked" modules are included as part of the node project.
  // Like builtins they are registered *before* node::Init runs.
  mp->nm_flags = NM_F_LINKED;
  mp->nm_link = modlist_linked;
  modlist_linked = mp;
} else {
  thread_local_modpending = mp;
}
Copy the code

Modlist_internal is a linked list of all built-in modules, so the GetInternalBinding method above is an execution logic like this:

The internalBinding calls in the figure above provide a variety of module names, including the constants and natives special module names we just asked about.

In this way, the above two problems are easily solved.

But is the problem really all over? NODE_MODULE_CONTEXT_AWARE_INTERNAL will not be called if the file is simply compiled, so where does node_module_register come from?

πŸ™†, I appreciate your dedication. This last question, along with a summary process of the whole article, is released to you as a big egg ~

4.5. Ultimate Big picture

Above is a complete flowchart of NodeJS working with Libuv and V8, with one point that explains the question: when do all built-in modules load into modlist_Internal? The answer is nodejs startup called when binding: : RegisterBuiltinModules ().

This should be the end of the article, but to strengthen our zhuang knowledge, we decided to take an example to see if all the theories discussed in the article on how to properly use v8 embedding in our C++ applications are correct in the Nodejs source code.

5, for example 🌰(egg ~)

Suppose we have an index.js:

const fs = require('fs')

module.exports = () => {
  fs.open('test.js', () => {
    // balabala
  })
}
Copy the code

What happens when you type node index.js on the command line?

This question is so damn like “what happens when you type a URL into the browser and press enter”. Good thing this isn’t an interview.

Is that two or three lines of code? But with just two or three lines of code, you can ask a lot of questions.

  • Why is it hererequireCan you reference it without declaring it?
  • Can I replace module. Export with exports?
  • Does fs.open have a synchronization method?
  • Fs. open can be used to specify the open mode.
  • Fs. open (uv_fs_open) {fs.open (uv_fs_open);

There are many more questions to ask, but I will not list them here. If you want more questions, please leave a comment (😏).

Today’s focus is not on these interview questions, but on verifying that the C++ code is as written in the previous article. We parse the past line by line (without going too far).

5.1,require('fs')

When you require nodejs actually don’t direct execution in js file you write any code (in addition to the above mentioned internal/the bootstrap loader. Js and internal/bootstrap/node. Js). It puts your code into a wrapper function and then executes that wrapper function. This is why top-level variables defined in any module are kept within the scope of that module.

Such as:

~ $ node
> require('module').wrapper
[ '(function (exports, require, module, __filename, __dirname) { '.'\n}); ' ]
>
Copy the code

You can see that the wrapper function has 5 arguments: exports, require, module, __filename and __dirname. So all the require and module.exports you wrote in your JS file are these parameters, not real global variables

More details will not unfold, or really can not say the end ~

5.2,fs.open

The open js file is called:

binding.open(pathModule.toNamespacedPath(path),
               flagsNumber,
               mode,
               req);
Copy the code

Next we jump to node_fs.cc to verify the previous theory step by step.

5.2.1,Initialize

Remember from the last egg in the image above, when internalBinding is called, the corresponding built-in module is initialized, that is, its initialization function is called, in this case, the Initialize function.

This function starts by setting method to target, as in:

env->SetMethod(target, "close", Close);
env->SetMethod(target, "open", Open);
Copy the code

That ->Set(context, name_string, function).check (); Is this the same as in section 2, calling C++ functions, how to properly use v8 embedded in our C++ application?

Next we expose the FSReqCallback class, which is called in the fs.js file:

const req = new FSReqCallback();
req.oncomplete = callback;
Copy the code

Using C++ classes to properly use v8 embedded in our C++ applications

Local<FunctionTemplate> fst = env->NewFunctionTemplate(NewFSReqCallback);
fst->InstanceTemplate()->SetInternalFieldCount(1);
fst->Inherit(AsyncWrap::GetConstructorTemplate(env));
Local<String> wrapString =
    FIXED_ONE_BYTE_STRING(isolate, "FSReqCallback");
fst->SetClassName(wrapString);
target
    ->Set(context, wrapString,
          fst->GetFunction(env->context()).ToLocalChecked())
    .Check();
Copy the code

It fits perfectly with all the theoretical stuff we’ve been talking about.

And then how do we use libuv

5.2.2,Open

Asynchronous calls encapsulate a single function called AsyncCall, which in turn calls AsyncDestCall:

AsyncCall(env, req_wrap_async, args, "open", UTF8, AfterInteger,
              uv_fs_open, *path, flags, mode);
Copy the code

The subsequent calls are the same as the example we provided earlier in fs.c, just for encapsulation, hiding a lot of things and making it a bit harder to read.

Here, πŸ’ you completed the reading of this article, also thank you for your patience so that you have mastered a piece of knowledge, have not read the words, click a collection, later encountered when you can take out reference ~

Thanks ~

reference

  1. Internals of Node- Advance node
  2. Analyze the loading and running principle of Node.js module with source code