A look at the Typescript parser

preface

Some time ago, I read the source code of the open source component Stryker and became interested in Typescript parser. This open source component is used to check the quality of monobags by automatically changing certain code content by identifying the source code and then seeing if the monobags can detect it. What Typescript parsers do is recognize the key step of source code.

So I spent some time learning about Typescript parsers, and it felt like opening a new door to a lot of fun things to do.

P.S. stryke (github.com/stryker-mut…).

Most fundamentally, generate an AST

A look at the Stryker source shows the following key statements for applying the Typescript parser:

export function parseFile(file: File, target: ts.ScriptTarget | undefined) {
  return ts.createSourceFile(file.name, file.textContent, target || ts.ScriptTarget.ES5, /*setParentNodes*/ true);
}
Copy the code

The ts module was introduced earlier:

import * as ts from 'typescript';
Copy the code

The createSourceFile function simply writes arguments to the createSourceFile function, passing Typescript code to the createSourceFile function, and leaving the last two arguments as default. The output is an abstract syntax tree (AST).

You can view the contents of each node in the tree through NodeJS breakpoint debugging. But I did some digging on the Internet and found a code that makes it easier to semantically type out the tree:

// Prints the syntax tree for TS
const printAllChildren = (node: ts.Node, depth = 0) = > {
  console.log(new Array(depth + 1).join(The '-'), ts.SyntaxKind[node.kind], node.pos, node.end);
  node.getChildren().forEach((c) = > printAllChildren(c, depth + 1));
};
Copy the code

At this point I use the following source code test:

export const test = (a: number) = > a + 2;
export const test2 = 0;
Copy the code

Output below:

It is easy to understand the tree one by one.

The principle of Stryker plug-in is to generate AST, deep traverse each tree node, modify the specific tree node, and then regenerate the source code (generated back source should be used with ts.Createprinter capability, example).

Now that we know how to generate an AST and iterate over it, we can start playing around with it

Application: Who wrote the most single test cases?

Suppose the project uses Jest to write a single test.

There is nothing to be said for traversing directories to find single test files and reading file contents. We take the contents of the file and generate the Typescript AST and start iterating.

How to determine whether a node is describe or test/it? It can be

if (ts.SyntaxKind[node.kind] === 'CallExpression') {
    const funcName = node.expression.escapedText;
    if (funcName === 'describe') {
        // TODO
    } else if (funcName === 'it' || funcName === 'test') {
        // TODO}}Copy the code

Next question, how do I know which lines of code this single test case block is? Each node of the AST has a start and end location of the source code, so we can calculate the number of lines. If the tree node is at the beginning and end of the line, it is possible to include line breaks and comments as well.

Method 1: You can use the following method to calculate a row by avoiding the start and end of the row with an intermediate field:

else if (funcName === 'it' || funcName === 'test') {
    const testKeywordNode = node.getChildren()[0];
    parseInfoList.push({
        describeName: currDescribe,
        testName: node.arguments[0].text,
        lineBegin: getLineNumByPos(fileContent, testKeywordNode.end), // Do not use pos to avoid introducing comments
        lineEnd: getLineNumByPos(fileContent, node.end),
        expectLines: getExpectLines(fileContent, node) || [],  // Get all expect rows
    });
    return;
}
Copy the code

To obtain the position of the contents of the file, we need to obtain the number of newlines before the content:

const getLineNumByPos = (fileContent, pos) = > {
    const contentBefore = fileContent.slice(0, pos);
    // Count the number of newlines
    return (contentBefore.match(/\n/g) || []).length + 1;
};
Copy the code

Method 2: TS encapsulates the interface. TS provides getStart\getFullStart and other interfaces. The difference between getFullStart and getStart is that getFullStart contains the preceding newline and comment (if any), while getStart does not. There is also an API for the lines (why don’t I just use this one? Ak47 schematic item and loot added:

const { line, character } = sourceFile.getLineAndCharacterOfPosition(node.getStart());
Copy the code

Git’s blame command is used to get the author of the line where the use case code is located.

Advanced: Uses the Typescript upper-layer interface

CreateSourceFile is still pretty handy, and you can do a lot of things by analyzing AST trees, but its downsides are obvious:

As a low-level API, the focus is on the type of each token in the underlying individual file, but there is no connection to the upper level. For example, Typescript type inference cannot be obtained by looking at a token. Some type inference is even required across files, which is not known by AST trees.
Only according to the level of the code layer by layer down traversal, encounter the same function of a variety of writing will be compatible discrimination, and there is no AST tree interface to go up, causing a lot of inconvenience to write related functions;

In this regard, Typescript’s parser capabilities have helped us build the whole thing, rather than slash-and-burn pulling code from the AST. This is the architecture diagram (GitHub Wiki from Typescript)

The Wiki is github.com/microsoft/T… Although it is in English but it is quite understandable, recommend reading.

There is an introduction to the overall parsing process, generating AST is just the initial step, and some concepts are also explained. Personally, the following concepts are more interesting:

Program

SourceTree is the structure of a single file, and multiple SourceTree are related to each other to form a Program. Programs can be created as a set of source files or as a single source file. In this case, like Webpack retrieving files from the main entry, Typescript imports all files referenced by the source file into the Program and parses them:

this.program = ts.createProgram([this.srcFile], {
    target: ts.ScriptTarget.ES5,
    module: ts.ModuleKind.CommonJS,
});
Copy the code

Because the relevant files are introduced, the association between files and codes can be found, so many advanced functions are based on the foundation of Program.

TypeChecker

As the name suggests, it does type checking, which is what it provides. Created from Program:

this.checker = this.program.getTypeChecker();
Copy the code

You can then do various types of things, such as getting the input and output types of a function:

const getFunctionTypeInfoFromSignature = (signature: ts.Signature, checker: ts.TypeChecker): IFunctionTypeInfo= > {
  // Get the parameter type
  const paramTypeStrList = signature.parameters.map((parameter) = > {
    return checker.typeToString(checker.getTypeOfSymbolAtLocation(parameter, parameter.valueDeclaration));
  });

  // Get the return value type
  const returnType = signature.getReturnType();
  const returnTypeStr = checker.typeToString(returnType);

  return {
    paramTypes: paramTypeStrList,
    returnType: returnTypeStr,
  };
};

export const getFunctionTypeInfoByNode = (
  node: ts.ArrowFunction | ts.FunctionDeclaration | ts.MethodDeclaration,
  checker: ts.TypeChecker,
): IFunctionTypeInfo= > {
  const tsType = checker.getTypeAtLocation(node);
  return getFunctionTypeInfoFromSignature(tsType.getCallSignatures()[0], checker);
};
Copy the code

Here you’ll see getCallsignatures return an array because Typescript supports function refactoring.

In using TypeChecker you will notice another important concept:

Symbol

Each Node of the AST is a Node.

In short, Node is a syntactical block of code. It may be a variable name, a keyword like function, or a code block. Symbol is a Symbol, and each Symbol is similar to the variable name we entered during console debugging. Two functions may locally define two variables with the same name, but they belong to different symbols. The variable a derived from A.ts is used in B.ts and corresponds to the same Symbol.

In my opinion, the main functions of Symbol are:

The Node data structure can be interpreted as a bunch of initial data, and the variable name must be obtained by retrieving the name type and the text inside, and then it is better to trim it:

Converting the corresponding Symbol is similar to changing the high-level structure, just need to call the upper interface such as:

const symbol = this.checker.getSymbolAtLocation(declaration.name)! ;const name = symbol.getName();
Copy the code

Including access to the class’s Symbol makes it easy to access the constructor’s data by calling an interface, no matter how deep it is hidden.

const symbol = checker.getSymbolAtLocation(node.name);
if(! symbol) {return null;
}

const tsType = checker.getTypeOfSymbolAtLocation(symbol, symbol.valueDeclaration);
const signature = tsType.getConstructSignatures()[0];
Copy the code

Type association. The same Symbol has the same type, even if it is used in two different files. However, I found that the Type can be obtained through Symbol and the Type can be obtained through Node. The function of Symbol is not necessary.

Read more about the Typescript parser in the Typescript Wiki. There are several examples of how to look directly at the Typescript code: github.com/microsoft/T…

By understanding these concepts, the same functionality can be implemented in a more elegant way rather than having to write a bunch of code based on an AST tree.

Before you start writing, get to know the site and the apis

I initially foolishly printed the AST tree with the code from the first session until I got to the following website: ts-commencement viewer.com/#

The functions are as follows:

It is very useful, especially in the far right side, to know what properties and methods each AST node has (different types of AST nodes vary), and to easily retrieve relevant data rather than having to iterate through layers.

The lower left pane you can look at in conjunction with this demo: github.com/microsoft/T…

Moving on to the parser API that Typescript provides. In general, this is particularly funny, as there is no list of parsers’ apis on GitHub, except to look at the examples and browse the Typescript source code. Here are a few that I’ve used and found useful:

Type recognition class API

Refer to the above AST website for each layer node type, there are corresponding judgment functions, such as ts.isClassDeclaration, ts.isarrowFunction, etc.

You can also use ts.syntaxkind [node.kind] === ‘VariableStatement’ in the first section of the demo, but using the standard interface is more ts friendly:

Check modifierFlag API

ModifierFlag, which can be simply interpreted as modifiers such as public, private and Async, can be seen on the AST view site above:

To determine if a tree node has a flag, use the following notation:

export const isNodeExported = (node: ts.Node): boolean= > {
  return (ts.getCombinedModifierFlags(node asts.Declaration) & ts.ModifierFlags.Export) ! = =0;
};
Copy the code

I’m still exploring more useful apis.

Application: Gets all the members of a class and their type definitions

Process:

private analyseExportNodeForClass(node: ts.ClassDeclaration) {
  const className = node.name?.getFullText().trim() || ' ';
  const classMemeberInfoList: IClassMemberInfo[] = [];

  node.members.forEach((member) = > {
    if(! ts.isPropertyDeclaration(member) && ! ts.isMethodDeclaration(member)) {return;
    }

    const { name, type, funcArgsType } = ts.isPropertyDeclaration(member)
      ? this.getBasicInfoFromVarDeclaration(member)
      : this.getBasicInfoFromFuncDeclaration(member);
    const accessibility = getClassAccessibility(member);

    console.log('name', name);
    console.log('type', type);
    console.log('funcArgsType', funcArgsType);
    console.log(' ');

    classMemeberInfoList.push({
      name,
      type,
      funcArgsType,
      accessibility,
    });
  });

  // The constructor is handled separately
  const constructorParamType = this.getConstructorParamType(node);
  
  // TODO outputs related variables
  console.log(className, classMemeberInfoList, constructorParamType);
}
Copy the code

Get the definition and type of the member function:

private getBasicInfoFromFuncDeclaration(declaration: ts.FunctionDeclaration | ts.MethodDeclaration) {
  const symbol = this.checker.getSymbolAtLocation(declaration.name!) ! ;const name = symbol.getName();
  const typeInfo = getFunctionTypeInfoByNode(declaration, this.checker);
  const type = typeInfo.returnType;
  const funcArgsType = typeInfo.paramTypes;

  return {
    name,
    type,
    funcArgsType,
  };
}

// utils.ts
export const getFunctionTypeInfoByNode = (
  node: ts.ArrowFunction | ts.FunctionDeclaration | ts.MethodDeclaration,
  checker: ts.TypeChecker,
): IFunctionTypeInfo= > {
  const tsType = checker.getTypeAtLocation(node);
  return getFunctionTypeInfoFromSignature(tsType.getCallSignatures()[0], checker);
};

const getFunctionTypeInfoFromSignature = (signature: ts.Signature, checker: ts.TypeChecker): IFunctionTypeInfo= > {
  // Get the parameter type
  const paramTypeStrList = signature.parameters.map((parameter) = > {
    return checker.typeToString(checker.getTypeOfSymbolAtLocation(parameter, parameter.valueDeclaration));
  });

  // Get the return value type
  const returnType = signature.getReturnType();
  const returnTypeStr = checker.typeToString(returnType);

  return {
    paramTypes: paramTypeStrList,
    returnType: returnTypeStr,
  };
};
Copy the code

Getting the member attributes is a bit easier, so I won’t go into details.

Get the constructor type:

private getConstructorParamType(node: ts.ClassDeclaration) {
  const constructorInfo = getConstructorInfo(node, this.checker);
  const constructorParamType: string[] = constructorInfo
    ? constructorInfo.paramTypes
    : [];

  return constructorParamType;
}

// utils.ts
export const getConstructorInfo = (node: ts.ClassDeclaration, checker: ts.TypeChecker): IFunctionTypeInfo | null= > {
  if(! node.name) {return null;
  }

  const symbol = checker.getSymbolAtLocation(node.name);
  if(! symbol) {return null;
  }

  const tsType = checker.getTypeOfSymbolAtLocation(symbol, symbol.valueDeclaration);
  const signature = tsType.getConstructSignatures()[0];

  if(! signature) {return null;
  }

  return getFunctionTypeInfoFromSignature(signature, checker);
};
Copy the code

If you want to know how open each member is, you can handle it like this:

export const getClassAccessibility = (node: ts.PropertyDeclaration | ts.MethodDeclaration) = > {
  // const hasPublic = (ts.getCombinedModifierFlags(node) & ts.ModifierFlags.Public) ! = = 0;
  consthasPrivate = (ts.getCombinedModifierFlags(node) & ts.ModifierFlags.Private) ! = =0;
  consthasProtect = (ts.getCombinedModifierFlags(node) & ts.ModifierFlags.Protected) ! = =0;

  return hasProtect ? ts.ModifierFlags.Protected : hasPrivate ? ts.ModifierFlags.Private : ts.ModifierFlags.Public;
};
Copy the code

At this point, all the information in a class is available. Of course, there are some special scenarios, such as static members, such as getters, and looking at the structure of the site with an AST can give you a rough idea of what API to use.

We can do a lot of things with class information, like automatically output the interface documentation for each class, right? Fully automatic mock an instance of a class? There is a lot of room for imagination

conclusion

This is probably just the tip of the Typescript parser iceberg. As Typescript usage increases, learning about Typescript parsers and writing tools for them will become both interesting and challenging. Look forward to seeing more tools

Note: Relevant reference documents

Official wiki: github.com/microsoft/T…
Deep understanding of Typescript: jkchao. Making. IO/Typescript -…