This article documented a “strange” pit encountered when using Node gRPC (static CodeGen mode). While the problem itself is uncommon, the investigation of the problem reveals some interesting points. Getting to the bottom of a problem and gaining experience is a great way to learn. So I wrote down the sweep and the points involved.

In order to make everyone have a better reading experience, I prepared a demo to restore this question, friends who are interested can clone it and “eat” it with the article.

1. Scene restoration

If you have read about gRPC or used it in NodeJS, you will know that it has two usage modes — “Dynamic codegen” and “static Codegen”.

Here’s a quick explanation (if you know about gRPC, you can skip this paragraph). RPC frameworks typically choose an IDL, and gRPC uses Protocol Bufffers by default, which we call PB or PROTO files. From the PB file can automatically generate serialization/deserialization code (xxx_pb.js), for gRPC also generated for gRPC code (xxx_grpc_pb.js’). If the Nodejs process starts, load the corresponding PB file generation method, which is called “dynamic code generation”. The js file generated by the tool is called “static code generation”. See the examples provided in the official gRPC library.

Our project uses an internal decryption component package (also maintained by us) called KeyCenter. The gRPC request is used in the decryption component, and it uses the “static code generation” pattern.

The project has been running normally until now. Until the redis component was introduced to implement caching. After happily adding the code and running, the console returns the following error message:

Error: 13 INTERNAL: Request message serialization failure: Expected argument of typekeycenter.SecretData at Object.callErrorFromStatus (/Users/xxxx/server/node_modules/@infra-node/grpc-js/build/src/call.js:31:26) at Object.onReceiveStatus (/Users/xxxx/server/node_modules/@infra-node/grpc-js/build/src/client.js:176:52) at Object.onReceiveStatus (/Users/xxxx/server/node_modules/@infra-node/grpc-js/build/src/client-interceptors.js:342:141) at Object.onReceiveStatus  (/Users/xxxx/server/node_modules/@infra-node/grpc-js/build/src/client-interceptors.js:305:181) at /Users/zhouhongxuan/programming/xxxx/server/node_modules/@infra-node/grpc-js/build/src/call-stream.js:124:78 at processTicksAndRejections (internal/process/task_queues.js:75:11)Copy the code

The Redis component does indirectly depend on gRPC. Here is a component module dependency that illustrates the relationships between the packages used by the project.

Each yellow component is a separate NPM package. The business code directly uses keyCenter package to decrypt the secret key; Also introduced is the Redis cache component, which indirectly relies on KeyCenter. The final KeyCenter component uses gRPC through “static code generation.”

Let’s take a look at this problem.

2. Troubleshooting

❗️ the sequence of the following sections is not the actual sequence of troubleshooting. You are advised to look at the “recent scene” before you actually troubleshoot the problem. 👀 For example, if the Request message serialization failure fails, the system first checks the error. At the same time with the upper (outer) logic of the investigation, forced to find the truth. However, in order to make the article read more smoothly and approach the truth step by step from the appearance of the problem, I choose the current structure of the article. I’ll try to preserve the actual search path as much as possible.

2.1 is there an error in the internal logic of the Redis component?

The immediate thought is that there is something wrong with the newly introduced Redis component. I commented out the following line of code in the project the first time something went wrong:

- this.redis = new Redis(redisConfig);
+ // this.redis = new Redis(redisConfig);
Copy the code

It’s good to be done. So introducing new components does cause problems.

Since the error was related to gRPC, and redis was indirectly dependent on gRPC internally (because it was indirectly dependent on KeyCenter), my first thought was that there might be a problem with the internal logic of this component. Perhaps an operation used the KeyCenter method and reported an error.

But as quickly as the idea emerged, it was ruled out.

By adding breakpoints and logging, we quickly came to the conclusion that the Redis component relied on KeyCenter, but did not call its methods during the entire instantiation process, and since it did not call, the gRPC error was not directly caused by it.

But it is more or less related to the Redis component.

Is it true that redis instantiation caused the error?

Above I comment out the Redis instantiation of the code line after the operation is normal, preliminary judgment is caused by the instantiation of the problem. However, I am missing an important point. Typescript compiles modules that are not imported but are not used, and the resulting code deletes the section where the module was introduced.

For example, if the imported module is not actually used, it will not be imported in the compiled code:

import Redis from '@infra-node/redis';
export default 1;
Copy the code

And if this is the case

import Redis from '@infra-node/redis';
Redis;
Copy the code

Or it

import '@infra-node/redis';
Copy the code

The code require(@infra-node/redis) introduced by the module will be retained in the output. Therefore, the instantiation operation is probably not the cause of the problem.

Upon further testing, it was found that the direct cause was the introduction of the @infra-Node/Redis module. Importing modules causes problems, as long as it doesn’t, it’s ok. My first instinct was two:

  • Side effects
  • dependencies

So let’s go back to the original question.

2.3,new A instanceof A === false?

Remember the original question? Error: 13 INTERNAL: Request message serialization failure Expected argument of type xxx_grpc_pb.js Expected argument of type XXX

function serialize_keycenter_SecretData(arg) {
  if(! (arginstanceof keycenter_pb.SecretData)) {
    throw new Error('Expected argument of type keycenter.SecretData');
  }
  return Buffer.from(arg.serializeBinary());
}
Copy the code

Serialize_keycenter_SecretData is the method used to serialize the SecretData instance to binary data at request time. As you can see, the method determines if arg is an instance of keyCenter_pb.secretData.

In the scenario of our project, we will have the Base64 encoded value of the PB object binary in advance, so we will use the deserialization provided by the xxx_pb.js file to generate an instance of SecretData and set other properties in the code.

import { SecretData } from '.. /gen/keycenter_pb';
// ...

// Deserialize binary
const secretData = SecretData.deserializeBinary(Buffer.from(base64, 'base64'));
secretData.setKeyName(keyName);

keyCenter.decrypt(secretData, metadata, (err, res) = > {
    // ...
});
Copy the code

And when I print arG here, it looks fine on the console.

SecretData. DeserializeBinary method is as follows:

proto.keycenter.SecretData.deserializeBinary = function(bytes) {
  var reader = new jspb.BinaryReader(bytes);
  var msg = new proto.keycenter.SecretData;
  return proto.keycenter.SecretData.deserializeBinaryFromReader(msg, reader);
};

proto.keycenter.SecretData.deserializeBinaryFromReader = function(msg, reader) {
  while (reader.nextField()) {
    if (reader.isEndGroup()) {
      break;
    }
    var field = reader.getFieldNumber();
    switch (field) {
    case 1:
      var value = / * *@type {string} * / (reader.readString());
      msg.setKeyName(value);
      break;
    case 2:... }}return msg;
};
Copy the code

From the var MSG = new proto. Keycenter. SecretData; Look through the SecretData its constructor creates an instance, and the incoming. DeserializeBinaryFromReader method of assignment, finally returned to the instance.

So it looks like A false statement of new A instanceof A === false. But apparently not. So my guess is that there must be a “ghost” in there — someone who looks like SecretData but isn’t is impersonating it.

It sounds strange. We just have to keep digging.

2.4. “Weird” dependency installation?

First review the package/module dependencies listed above:

I took a look at the actual package installation so far. It looks like this (some extraneous package information is omitted) :

.├ ─ GrPC-JS │... ├ ─ ─ keycenter └ ─ ─ redis ├ ─ ─ Changelog. Md ├ ─ ─ LICENSE ├ ─ ─ the README. Md ├ ─ ─ built ├ ─ ─ node_modules │ ├ ─ ─ @ infra - node │ │ │ . │ ├─ ├─ ├─ ├─ p-class ├─ ├─ ├─ ├─ ├─ ├─ pastip-class ├── download.jsonCopy the code

The package installation in the current project is listed above. One interesting thing you can see is that there is a KeyCenter package in the outer layer, and a KeyCenter package is installed inside Redis. Why is that?

The reason is simple: the keyCenter version declarations that the project relies on directly and the dependent versions in Redis cannot be combined to point to the same version, so they are installed separately in both places. This is the normal mechanism of NPM. This is usually not a problem.

But when I manually removed keyCenter from Redis, the project worked fine again. Looks like this is the place to be.

2.5. Do you refer to the wrong module file?

New A’ instanceof A === false new A’ instanceof A === false It was

function serialize_keycenter_SecretData(arg) {
  if(! (arginstanceof keycenter_pb.SecretData)) {
    throw new Error('Expected argument of type keycenter.SecretData');
  }
  return Buffer.from(arg.serializeBinary());
}
Copy the code

When this method is executed, the constructor for the arG passed in is actually different from keyCenter_pb.secretData in the method. This makes me wonder if I’m referencing the wrong _pb.js file. For example, one uses keycenter_pb.js in the outer keyCenter, and the other uses keycenter_pb.js in the keyCenter in Redis. The two files are the same, the function signature is the same, but the two objects look the same, but in fact they are different, naturally can not judge.

Is the keycenter_pb.js method used to construct arg arguments different from the keycenter_pb.js method used to construct serialize_keycenter_SecretData?

Based on what I know about the Nodejs require mechanism, I basically rule this out. They are introduced through a relative path, and according to the rules of module pathfinding, they all hit code modules in their respective packages. There is no case of code files being drawn into other packages.

2.6 How is the module “contaminated”?

If the referenced module is fine, could it be that variables within the module are “contaminated”?

This has something to do with my initial intuition, side effects. Side effects can occur in many ways, but one typical scenario is the use of global variables. After looking at the code for the keycenter_pb.js file, I found that it did:

var jspb = require('google-protobuf');
var goog = jspb;
var global = Function('return this') ();// ...
goog.exportSymbol('proto.keycenter.SecretData'.null.global);
// ...
goog.object.extend(exports, proto.keycenter);
Copy the code

The code gets the global object via Function(‘return this’)(). Then by performing a goog exportSymbol method, mounted on the global object global. The proto. Keycenter. SecretData attribute values. Finally, mount the proto. Keycenter object on exports as an export.

But if you look closely, the above code alone does not cause this error. Because it changes the reference to the global first and then the corresponding object on the Global. For example, after introducing modules, the reference relationship is roughly as follows:

When a file with the same content _pb’.js is reintroduced into the runtime, the following reference relationship will be created.

You can see that the original Proto object is not modified, that is, the external previously imported object is not changed. So how exactly is “contaminated”?

The problem comes from the.deserializebinary method used in Section 2.3. This is the static method that _pb.js exposes on the constructor to generate the corresponding instance object from the binary data:

proto.keycenter.SecretData.deserializeBinary = function(bytes) {
    var reader = new jspb.BinaryReader(bytes);
    var msg = new proto.keycenter.SecretData;
    return proto.keycenter.SecretData.deserializeBinaryFromReader(msg, reader);
};
Copy the code

. Note that the second line var MSG = new proto keycenter. SecretData, USES the proto. Keycenter. SecretData this constructor, according to the previous code, we can know, Proto is [global]. Proto. So once the pointer on our global object is changed, the keycenter.SecretData used here is actually another constructor.

The truth came out. The error process is as follows:

  1. First of all,keycenter_grpc_pb.jsThe same directory is introducedkeycenter_pb.jsFile in the modulekeycenter.SecretDataThe constructor is now determined
  2. For some other reason, one package refers to the same PB file from another place, and we call it for the sake of distinctionkeycenter_pb-2.js. It andkeycenter_pb.jsIt’s exactly the same, just two files. The object pointed to in global is modified
  3. Then importkeycenter_pb.jsModule, and then useSecretData.deserializeBinaryGenerate instance, pass inkeycenter_grpc_pb.jsThe method in the

✨ In order for you to better understand, I copied the core logic of this problem and made a demo, you can clone to the local and then with the content of the article to view and run.


The troubleshooting of the problem has been completed at ☕️, and the following articles will go into another topic – problem fixing. The repair process, which was supposed to be smooth, also encountered some unexpected problems.


3. Solution

If you understand the cause of the error, you will find that the conditions under which the error occurred are rather harsh. Only when the following conditions are met at the same time can it reappear:

  1. A global variable was mounted
  2. The project imports both of the same contents_pb.jsfile
  3. Using the.deserializeBinaryMethod to create an instance object
  4. Modules need to be imported in the first order_grpc_pb.js, and then import_pb'.js(Another PB file with the same content)

As for conditions 2 to 4, we can avoid problems by breaking one of them. I have written corresponding codes (correct-2.ts, correct-3.ts and correct-4.ts) in the demo project, if you are interested, you can try them.

As a package provider, there may seem to be many ways to solve this problem, but in reality we have limited control over it

  • First, you need to ensure that only one KeyCenter package is installed. The dependency of different packages and modules on the package version is externally controlled and not controlled by the package itself, so it is difficult to ensure eradication;
  • And then number three, use.deserializeBinaryIt’s a functional requirement, and getting around this method can make your code tricky;
  • Finally, in clause 4, the order of references is obviously also externally controlled, not controlled by the package itself

So we are trying to find a “normal” way so that the _pb.js file generated by grPC-tools or protoc does not cause global contamination (break condition 1).

4. The road to restoration

4.1. Avoid global contamination of protoc generated code

Following the above thread, we would want to produce a “safe” _pb.js static file at protoc generation time.

Protoc supports setting import_style in the js_OUT parameter to control module types. The commonJS parameter is provided in the official documentation.

protoc --proto_path=src --js_out=import_style=commonjs,binary:build/gen src/foo.proto src/bar/baz.proto
Copy the code

Unfortunately, this parameter does not generate the code we expect, and the code it produces is the “problem code” we saw above. So are there any other import_styles?

There is no document, can only go to the source code to find the answer.

The following will involve protoc, here is a brief introduction, for those who do not know friends can quickly understand. The Protocol Compiler is included in the Protobuf repository. Where each language related code generator in the SRC/Google/protobuf/compiler/below correspond to the name of the folder. JavaScript, for example, is in the /js folder.

Commonjs and Closure are not the only style values supported:

// ...
else if (options[i].first == "import_style") {
  if (options[i].second == "closure") {
    import_style = kImportClosure;
  } else if (options[i].second == "commonjs") {
    import_style = kImportCommonJs;
  } else if (options[i].second == "commonjs_strict") {
    import_style = kImportCommonJsStrict;
  } else if (options[i].second == "browser") {
    import_style = kImportBrowser;
  } else if (options[i].second == "es6") {
    import_style = kImportEs6;
  } else {
    *error = "Unknown import style " + options[i].second + ", expected " +
              "one of: closure, commonjs, browser, es6."; }}// ...
Copy the code

However, after browsing through the source code, I found that browser and ES6 styles didn’t actually meet our needs. This leaves commonJs_strict. This strict feels very much in line with our goal.

The main relevant codes are as follows:

// Generate "require" statements.
if ((options.import_style == GeneratorOptions::kImportCommonJs ||
      options.import_style == GeneratorOptions::kImportCommonJsStrict)) {
  printer->Print("var jspb = require('google-protobuf'); \n");
  printer->Print("var goog = jspb; \n");

  // Do not use global scope in strict mode
  if (options.import_style == GeneratorOptions::kImportCommonJsStrict) {
    printer->Print("var proto = {}; \n\n");
  } else {
    printer->Print("var global = Function('return this')(); \n\n");
  }
  // ...
}
Copy the code

The biggest difference between CommonJs_strict and CommonJS is whether global variables are used. If it is commonjs_strict, var proto = {}; In place of global variables. Perfect for demand!

However, after actually using it, I found another problem.

4.2 grPC-Tools is not suitablecommonjs_strict

Import_style =commonjs_strict Another big difference is the generation of the export code: import_style=commonjs_strict

// if provided is empty, do not export anything
if(options.import_style == GeneratorOptions::kImportCommonJs && ! provided.empty()) {
  printer->Print("goog.object.extend(exports, $package$); \n"."package".GetNamespace(options, file));
} else if (options.import_style == GeneratorOptions::kImportCommonJsStrict) {
  printer->Print("goog.object.extend(exports, proto); \n"."package".GetNamespace(options, file));
}
Copy the code

This may not be intuitive, but just paste the code generated by the two styles.

The following is generated with commonjs_strict:

goog.object.extend(exports, proto);
Copy the code

The following is generated using CommonJS:

goog.object.extend(exports, proto.keycenter);
Copy the code

So you can see the difference. When exported in CommonJS, objects under the package are exported. Therefore, when we use the corresponding _pb.js file, we will need to adjust the imported code. In addition, the _grpc_pd.js static code generated by grpc-Tools also imports the _pb.js file, so it needs to accommodate this export as well.

This section describes the roles of grPC-Tools. It does two things, one is wrap some Protoc command lines so that users can use grPC-tools directly without worrying about protoc; The other one implements a PROTOc GRPC plug-in. There will be an opportunity to write a separate article on the protoc plugin mechanism and how to implement a Protoc plugin.

And when I looked at the grPC-Tools source code with great joy,

grpc::string file_path =
    GetRelativePath(file->name(), GetJSMessageFilename(file->name()));
out->Print("var $module_alias$ = require('$file_path$'); \n"."module_alias".ModuleAlias(file->name()), "file_path", file_path);
Copy the code

It does not take into account the case of import_style= commonJs_strict, but instead generates fixed import code corresponding to CommonJS. There are also issues that mention this problem.

4.3. Have to do it yourself

Well, there’s no good solution to this import/export problem right now.

Due to some special requirements, we folk the code of GRPC-Tools and modify the internal implementation to fit our RPC framework. Import_style =commonjs_strict

grpc::string pb_package = file->package(a);if(params.commonjs_strict && ! pb_package.empty()) {
  out->Print("var $module_alias$ = require('$file_path$').$pb_package$; \n"."module_alias".ModuleAlias(file->name()), "file_path", file_path, "pb_package", pb_package);
} else {
  out->Print("var $module_alias$ = require('$file_path$'); \n"."module_alias".ModuleAlias(file->name()), "file_path", file_path);
}
Copy the code

Of course, some other changes need to be made, such as the judgment processing of CLI entry parameters, which will not be pasted here.

Of course, this is not the only headache, if you use any other protoc plugin to automatically generate.d.ts files, this will also need to be matched to import_style=commonjs_strict.

5, the last

This paper mainly records a gRPC related error troubleshooting process. It includes the whole process of finding out the cause, putting forward the solution idea and finally repairing.

Troubleshooting is something every engineer has to deal with on a regular basis, and it’s often challenging. Often these problems are small and the fix is just a few lines of code. The process of removing obstacles, accompanied by the use of all kinds of knowledge or technical points, from the appearance to the truth, the whole process is the unique pleasure of engineers.

In article writing, compared with introducing a technical point, it is often more difficult to write a good obstacle elimination article, so I also want to challenge myself.

The article content has a companion demo code that can be used to help you understand the problems in the article.