A brief introduction to

Nodejs has four types of modules, one is c++ native modules, the other is modules. Nodejs has four types of modules, one is c++ native modules, the other is modules. Native js module, user js module and user c++ extension. This paper starts with the API process.binding to introduce the loading principle of c++ native module of nodejs.

Why do we need to understand module principles?

It lays the foundation for a deeper understanding of nodeJS

In order to understand the underlying layer of NodeJS, it is necessary to first understand the loading entry of its various files, which leads to the concept of modules. After that, the source code structure of NodeJS will be relatively clear, which is conducive to the subsequent learning of nodeJS source code. Generally speaking, CPP native modulesregisteredNODE_MODULE_CONTEXT_AWARE_BUILTIN, which adds a data node that describes the module to the module list and if you look at the nodeJS source code, NODE_MODULE_CONTEXT_AWARE_BUILTIN is a module if you find the macro NODE_MODULE_CONTEXT_AWARE_BUILTIN above.Now you may be thinking, why do I need to know the basics of NodeJS, and why not just develop in Node.js? Here I give some reasons:

  1. If you’re building a desktop application (like Electron) with NodeJS right now, understanding the basics of NodeJS may not be helpful if you’re doing simple development, but if you want to extend the capabilities of your desktop application, you can write a nodeJS native module extension that calls the OPERATING system API directly.

You can even use the CPP extension module to write a watering layer that communicates with Node.js and existing CPP programs, allowing you to call other existing CPP programs directly from NodeJS. 2. Some NodeJS frameworks may need to be rebuilt for a different nodeJS version or simply will not run. To understand why, there is a pre-foundation that needs to be understood by readers. This is because NodeJS uses V8 to parse javascript. The underlying API of V8 changes frequently, so switching between different nodeJS versions may render the original framework invalid. Because of this changing nature, NAN was born to encapsulate a set of NodeJS apis (using macros) and expose them to the user. However, the underlying NODEJS API changes frequently, because NAN does preprocessing for different NodeJS versions, so extension modules still have to be rebuilt. Here’s an example, such as the Node-Sass open source framework (built with NAN)You can see that Node-sass is very dependent on the nodeJS version

The ability to extend CPP

More efficient?

It is commonly said on the Internet that CPP extension modules can handle some computation-intensive tasks, and we can delegate such tasks to extensions, and then transfer the results to the JS layer to improve efficiency, but is this really the case? The test is as follows: We write an algorithm to calculate the Fibonacci sequence using CPP,

    long long int F(int n) // We can use the longlong type because of the large numerical results
    {
        int fibOne = 0;
        int fibTwo = 1;
        int fibN = 0;
        for (int i = 2; i <= n; i++)
        {
            fibN = fibOne + fibTwo;

            fibOne = fibTwo;
            fibTwo = fibN;
        }

        return fibN;
    }
Copy the code

Let me write another one in JavaScript

function F(n){

    let fibOne = 0;
    let fibTwo = 1;
    let fibN = 0;
    for (let i = 2; i <= n; i++)
    {
        fibN = fibOne + fibTwo;

        fibOne = fibTwo;
        fibTwo = fibN;
    }

    return fibN;
}

function test(i){
  const now=new Date(a); F(i);console.log( (new Date()-now)/1000);
}
test(100)
Copy the code

Results:

The amount of data 1e8 1e9 1e10
js 0.513 4.679 45.629
c++ 0.029 0.276 0.547

CPP has crushed JS in terms of efficiency… This is not to encourage CPP code for every occasion, but it makes sense to use CPP in such computationally intensive occasions.

Rapidly expanding

If you already have a developed CPP project that needs to be connected to your current JS project, how do you handle it? Common treatment means are:

  1. Both applications communicate with each other through an HTTP service.

After all, the communication is through HTTP packets, which is limited by HTTP itself, and the efficiency of the program is generally lower than direct call. Github open source project mirai (a kind of QQ robot framework) has used this communication mode, specifically to facilitate the development of mirai by users of other languages. There is a built-in HTTP communication plug-in (mirAI-HTTP-API) in its core, and other languages communicate with the core through HTTP requests, and then call the functions provided by the core. 2. Translate CPP projects directly into JS. If the direct translation, the requirements for programmers are too high, and there will be a risk of time cost, not fast docking. 3. Make a layer extension for CPP project to be called by JS. Direct calls are more efficient and take less time to develop than the above two methods.

Start with the OS module

Process. binding is an API on the nodeJS global object Process, which loads CPP source modules. This section uses the OS module as an example. OS CPP source code module print result [nodeJS V6.9.4 environment]

process.binding('os')
// BaseObject {
// getHostname: [Function: getHostname],
// getLoadAvg: [Function: getLoadAvg],
// getUptime: [Function: getUptime] {
// [Symbol(Symbol.toPrimitive)]: [Function (anonymous)]
/ /},
// getTotalMem: [Function: getTotalMem] {
// [Symbol(Symbol.toPrimitive)]: [Function (anonymous)]
/ /},
// getFreeMem: [Function: getFreeMem] {
// [Symbol(Symbol.toPrimitive)]: [Function (anonymous)]
/ /},
// getCPUs: [Function: getCPUs],
// getInterfaceAddresses: [Function: getInterfaceAddresses],
// getHomeDirectory: [Function: getHomeDirectory],
// getUserInfo: [Function: getUserInfo],
// setPriority: [Function: setPriority],
// getPriority: [Function: getPriority],
// getOSInformation: [Function: getOSInformation],
// isBigEndian: false
// }
Copy the code

You may have wondered, what is the OS module, and where is it? Nodejs uses the V8 interpreter to run JAVASCRIPT scripts. In fact, every data type in JS, Functions are defined in v8 at the bottom, and you may not have seen v8 code at all, but when you’re working with JS, you’re already working with V8.

Nodejs CPP source module OS, [location: “SRC /node_os.cc”]

#include "node.h"
#include "v8.h"/... A large stack of header file definitionsnamespace node {
namespace os {

...
// A bunch of functions define getHostname, GetUserInfo, etc.void Initialize(Local<Object> target, // Target is exports passed in from outside
                Local<Value> unused,
                Local<Context> context) {
  Environment* env = Environment::GetCurrent(context);
  env->SetMethod(target, "getHostname", GetHostname); //target.getHostname=GetHostName,GetHostname.name="getHostName"
  env->SetMethod(target, "getLoadAvg", GetLoadAvg);
  env->SetMethod(target, "getUptime", GetUptime);
  env->SetMethod(target, "getTotalMem", GetTotalMemory);
  env->SetMethod(target, "getFreeMem", GetFreeMemory);
  env->SetMethod(target, "getCPUs", GetCPUInfo);
  env->SetMethod(target, "getOSType", GetOSType);
  env->SetMethod(target, "getOSRelease", GetOSRelease);
  env->SetMethod(target, "getInterfaceAddresses", GetInterfaceAddresses);
  env->SetMethod(target, "getHomeDirectory", GetHomeDirectory);
  env->SetMethod(target, "getUserInfo", GetUserInfo);
  target->Set(FIXED_ONE_BYTE_STRING(env->isolate(), "isBigEndian"),
              Boolean::New(env->isolate(), IsBigEndian())); }}}NODE_MODULE_CONTEXT_AWARE_BUILTIN(os, node::os::Initialize) // Module volume macro
Copy the code

Env ->SetMethod(exports, “function name “, function); Env is the Initialize context, just like JavaScript’s function context. Process. binding(‘ OS ‘) is a cross-space door between JS and CPP, based on the JavaScript printable process.binding(‘ OS ‘) and the nodejs module registration code. By the way, it doesn’t matter if you can’t see it, I still have to explain the pseudo code above

function GetLoadAvg(){...}
...

function initialize(target,unused,context){
target.getLoadAvg=GetLoadAvg
target.getUptime=GetUptime
target.getTotalMem=GetTotalMemory
target.getFreeMem=GetFreeMemory
target.getCPUs=GetCPUInfo
target.getOSType=GetOSType
target.getOSRelease=GetOSRelease
target.getInterfaceAddresses=GetInterfaceAddresses
target.getHomeDirectory=GetHomeDirectory
target.isBigEndian=false;
}
os.initialize=initialize
NODE_MODULE_CONTEXT_AWARE_BUILTIN(os,os.initalize);
Copy the code

If you had to design a module yourself, how would you design it?

First, let’s look at process.binding

Accepts an ‘OS’ string as a module identifier

Then we need to design a ‘find’ process. The target of ‘find’ should be a data structure. This data structure should not have name, but also have its output object. Now we are ready to start designing the data structure.

Node_module is the target of ‘find’

  struct node_module{
    const char* name; The Object exports;//Object is a data structure that defines an Object
  }
Copy the code

Now that the node_module data structure has been designed, we need to find a way to store the modules. Because there is more than one module, we need to manage all the modules in a unified way. We can use a linked list, so we only need to specify a table head node in advance. Add a link member to node_module that points to the next node

 struct node_module{
   const char * name;
   Object exports;
   struct node_module* link;
  }

node_module *modlist_builtin;
Copy the code

The process of ‘finding

Node_module is a linked list of node objects. It’s natural to imagine walking through the list looking for nodes. This is actually what nodeJS does internally

struct node_module* find(const char* name) {
  struct node_module* mp
  //mpIs the intermediate node in the search,modlist_builtinIs the header of the globally registered module listfor (mp =modlist_builtin; mp ! =nullptr; mp = mp->link) {
    if (strcmp(mp->name, name) == 0) 
      break;
  }
  return (mp);
Copy the code

While at Find you may be wondering, where did our module list come from? How can a module be added to a module list? Next, we need to design the registration process to complete the module list

‘registered’

The modlist_builtin header is null, and the modlist_builtin header is not null.

void node_module_register(node_module *m) {
  if(modlist_builtin! =nullptr) {
    m->link = modlist_builtin; // The next node points to the modlist_buitin header
    modlist_builtin = m;          
  }else{ modlist_builtin = m; }}Copy the code

The registration and search are all done, and the last thing left to do is load, which is the space-time gate described above

loading


struct node_module* get_builtin_module(char* name) {
  struct node_module* mp;
  for(mp = modlist_builtin; mp ! =nullptr; mp = mp->link) {
// Compare mp.name and name to see if they are the same. STRCMP is a function that compares strings and returns 0 only if they are the same
    if (strcmp(mp->name, name) == 0) break;
  }
  return mp;
}
Object Binding(char *name) {
  node_module *mod = get_builtin_module(name);
  return mod->exports;
}
Copy the code

The node_module structure contains exports that have already been loaded. This does not fit the concept of “on demand”, so we need to do some modifications. First, we need to do some modifications. Struct node_module should not get exports directly, instead get init function of registered exports, and then Binding to run the registered exports function at load time

struct node_module {
  char * name;
  Object (*init)(Object);
  struct node_module* link;
};
Object Binding(char *name) {
  node_module *mod = get_builtin_module(name);
  Object exports;
  return mod->init(exports);
}
Copy the code

Finally, we designed our module. If we want to register a module, the user side use is as follows:

Binding source code

Binding is the process.binding for nodeJS. The next step is to directly describe the module loading in the nodeJS source code. Nodejs source Binding[location :” SRC /node.cc”]


static void Binding(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args); // Create a current environment context

  Local<String> module = args[0] - >ToString(env->isolate());  
  node::Utf8Value module_v(env->isolate(), module);  


  Local<Object> cache = env->binding_cache_object(a);// Get the global cache object
  Local<Object> exports; // Declare exports object

// If the current module is in the cache, return it directly;
  if (cache->Has(env->context(), module).FromJust()) {
    exports = cache->Get(module) - >ToObject(env->isolate());
    args.GetReturnValue().Set(exports);
    return; }...// Obtain the CPP core source code module, which is important here and will be discussed later.
  node_module* mod = get_builtin_module(*module_v);
  if(mod ! =nullptr) {
// Core module processing
    exports = Object::New(env->isolate()); 
    // Internal bindings don't have a "module" object, only exports.. Local<Value> unused =Undefined(env->isolate());

    mod->nm_context_register_func(exports, unused,
      env->context(), mod->nm_priv); Prive initializes the initial function used to register OS modules
    cache->Set(module, exports);  //cache.module=exports
  } else if (!strcmp(*module_v, "constants")) {... }else if (!strcmp(*module_v, "natives")) {
// Get native JS modules. Native modules are located under "lib/"
  }  else{... } args.GetReturnValue().Set(exports);  // Exports to the nodejs local environment where process.binding(' OS ') returns the value (an object)
}
Copy the code

If you don’t understand it, you can look at the pseudo-code I wrote

function binding(args){
    const env=Environment.getCurrent(args);
    let module=args[0];
    let module_v=module.toUtf8Value();
    let cache=env.cache;// Get the cache object from globally.
    // Query the cache. If the module is already loaded, return it directly from the cache
    if(cache(module)) {return module.exports;
    }
   

// Get the CPP core module
    let mod=get_builtin_module(module_v);
    if(mod){
        let exports= {}; mod.init(exports.undefined,env.context,mod.nm_priv); //exports is initialized by init
    }
    return exports;// return exports to the nodejs command line environment.
}
Copy the code

Next, and more importantly, how does get_builtin_module get CPP core source modules?

Get_builtin_module gets the core module

What this function does internally is compare file identities on a linked list of c++ core modules called modlist_builtin to return the corresponding modules.

struct node_module* get_builtin_module(const char* name) {
  struct node_module* mp;
// Modlist_builtin is a linked list of modules that have already been registered. The registration process will continue later.
  for(mp = modlist_builtin; mp ! =nullptr; mp = mp->nm_link) {
// If nm_modName and name are the same, STRCMP is a comparison function, which will return 0 only if they are the same
    if (strcmp(mp->nm_modname, name) == 0) 
      break;
  }

  return (mp);
Copy the code

At the end of the day, the only thing left to wonder about is how modlist_builtin comes from. Look at the following breakdown: remember at the beginning of the OS underlying CPP source code in the last sentence

NODE_MODULE_CONTEXT_AWARE_BUILTIN(os, node::os::Initialize)
Copy the code

The source logic is not complicated, we just need to know that the sentence is to register on the list

CPP source code module registration

The OS source module finally calls the NODE_MODULE_CONTEXT_AWARE_BUILTIN register module

NODE_MODULE_CONTEXT_AWARE_BUILTIN(os, node::os::Initialize)
Copy the code

The macro above calls

#define NODE_MODULE_CONTEXT_AWARE_BUILTIN(modname, regfunc)           \
  NODE_MODULE_CONTEXT_AWARE_X(modname, regfunc, NULL, NM_F_BUILTIN)   \
Copy the code

NODE_MODULE_CONTEXT_AWARE_X is the key macro to register the module, which defines a module structure and registers the module structure with the node_module_register function as follows:

#defineNODE_MODULE_CONTEXT_AWARE_X(modname, regfunc, priv, flags) \ static node::node_module _module = \ { \ NODE_MODULE_VERSION, \ flags, \ NULL, \ __FILE__, \ NULL, \ (node::addon_context_register_func) (regfunc), \ NODE_STRINGIFY(modname), \ priv, \ NULL \ }; \ node_module_register(&_module); The \}

/ / node_module definition
struct node_module {
  int nm_version;
  unsigned int nm_flags;
  void* nm_dso_handle;
  const char* nm_filename;
  node::addon_register_func nm_register_func;
  node::addon_context_register_func nm_context_register_func;
  const char* nm_modname;
  void* nm_priv;
  struct node_module* nm_link;
};
Copy the code

This defines a struct _module that contains basic information about a module and is then registered with the module list via node_module_register

static node_module* modlist_builtin; / / head node

void node_module_register(void* m) {


  if (m->nm_flags & NM_F_BUILTIN) {
// Very classic single linked list fingery
    m->nm_link = modlist_builtin; // The next node points to the modlist_buitin header
    modlist_builtin = m;              //mp becomes the new head node}... }Copy the code