Vscode language functions such as code highlighting, code completion, error diagnosis and jump definition are jointly implemented by three solutions, including:

  • Word segmentation tokens are recognized and highlighted based on lexical analysis techniques
  • Based on semantic analysis techniques, word segmentation tokens are recognized and highlighted
  • Based on the programmable language feature interface, it can recognize the code content and apply the highlighting style, in addition to error diagnosis, intelligent prompt, formatting and other functions

Increasing step by step, the function of the three kinds of scheme category accordingly technical complexity and implementation costs also rise step by step, this article will briefly introduce the working process of the three options and features, what their job, so to write to each other, step by step, and connecting with the actual case to uncover vscode function code highlighting the implementation of the principle, content structure is as follows:

Vscode plug-in basics

Before introducing vscode highlighting, it is necessary to familiarize yourself with the underlying architecture of vscode. Similar to Webpack, vscode itself just implements a set of shelves. The commands, styles, status, debugging and other functions inside the shelves are provided as plug-ins. Vscode provides five extensions externally:

Among them, the code highlighting function is realized by the language extension class plug-in, which can be subdivided into:

  • Declarative: You can declare a bunch of lexical re’s in a specific JSON structure. You can add language features such as block-level matching, automatic indentation, and syntax highlighting without writing logical code. Vscode’s built-in Extensions/CSS and Extensions/HTML are implemented based on declarative interfaces
  • Programming: vscode listens for user behavior as it runs, triggers event callbacks when certain behaviors occur, and programming language extensions need to listen for these events, dynamically analyze text content, and return code information in a specific format

Declarative high performance, weak capability; Low programming performance, strong ability. Language plugin developers can often mix it up, using declarative interfaces to recognize lexical tokens in the shortest time and provide basic syntax highlighting; Programming interfaces are then used to dynamically analyze the content, providing more advanced features such as error diagnosis, intelligent hints, and so on.

The declarative language extension in Vscode is implemented based on TextMate lexical analysis engine. Programming language extension is implemented in three ways: semantic analysis interface, VScode.language.* interface and Language Server Protocol. The basic logic of each technical solution is introduced below.

Lexical highlight

Lexical Analysis is the process of transforming character sequences into token sequences in computer science, and token is the smallest unit that constitutes source code. Lexical Analysis technology has been widely used in compilation, IDE and other fields.

For example, the lexical engine of vscode analyzes the token sequence and then applies the highlighting style according to the token type. This process can be divided into two steps: word segmentation and style application.


  • Macromates.com/manual/en/l…

  • Code.visualstudio.com/api/languag…


The word segmentation process essentially breaks down a long string of code recursively into string fragments with specific meanings and classifications, such as operators +-*/%. Var /const; Constant values of type 1234 or “tecvan”, which is simply a way of identifying what word is there in a piece of text.

The lexical analysis of Vscode is realized based on TextMate engine, which has complex functions and can be simply divided into three aspects: regular-based word segmentation, compound word segmentation rules and nested word segmentation rules.

The basic rule

The underlying TextMate engine of Vscode implements word segmentation based on regular matching. At runtime, it scans the text line by line and tests whether the text line contains content that matches a particular regular with a predefined set of rules, such as the following rule configuration:

    "patterns": [{"name": "keyword.control"."match": "\b(if|while|for|return)\b"}}]Copy the code

In the example, patterns is used to define a set of rules, the match property is scheduled to be used to match the token’s re, the name property declares the token’s scope, and the TextMate word segmentation process encounters content that matches the match re. It is treated as a separate token and classified as a keyword. Control type for the name declaration.

The example above recognizes the if/while/for/return keywords as keyword. Control, but no other keywords:

In TextMate context, scope is a.split hierarchy, such as keyword and keyword. Control form a parent-child hierarchy, which implements a CSS selection-like matching in style processing logic, more on that later.

Compound word segmentation

The example configuration object above is called a Language Rule in the context of TextMate, and in addition to match for single lines, you can also use the begin + End attribute pairs to match more complex cross-line scenarios. From begin to end the recognition to the scope, is considered to be the name type of token, such as in vuejs/vetur plug-in syntaxes/vue tmLanguage. Have so a json file configuration:

    "name": "Vue"."scopeName": "source.vue"."patterns": [{"begin": "(<)(style)(? ! [^/>]*/>\\s*$)".// Make a fictitious field for easy explanation
          "name": "tag.style.vue"."beginCaptures": {
            "1": {
              "name": "punctuation.definition.tag.begin.html"
            "2": {
              "name": "entity.name.tag.style.html"}},"end": "(</)(style)(>)"."endCaptures": {
            "1": {
              "name": "punctuation.definition.tag.begin.html"
            "2": {
              "name": "entity.name.tag.style.html"
            "3": {
              "name": "punctuation.definition.tag.end.html"}}}]}Copy the code

In the configuration, begin is used to match a

statement, and

the entire statement is given a scope of tag.style.vue. In addition, the characters in the statement are assigned to different scope types by the beginCaptures and endCaptures attributes:

Here, from begin to beginCaptures and from end to endCaptures, there is a degree of composition to match multiple lines of content at once.

Rules of nested

In addition to begin + end, TextMate also supports defining nested language rules in subpatterns, such as:

    "name": "lng"."patterns": [{"begin": "^lng`"."end": "`"."name": "tecvan.lng.outline"."patterns": [{"match": "tec"."name": "tecvan.lng.prefix"
                    "match": "van"."name": "tecvan.lng.name"}}]]."scopeName": "tecvan"
Copy the code

Config identifies strings between LNG ‘and’ and classifies them as tecvan.LNg.outline. We then recursively process the content between the two and match more specific tokens according to the sub-Patterns rules, such as for:

lng`awesome tecvan`
Copy the code

Discernible participles:

  • lng`awesome tecvan`, the scope fortecvan.lng.outline
  • tec, the scope fortecvan.lng.prefix
  • van, the scope fortecvan.lng.name

TextMate also supports language-level nesting, for example:

    "name": "lng"."patterns": [{"begin": "^lng`"."end": "`"."name": "tecvan.lng.outline"."contentName": "source.js"}]."scopeName": "tecvan"
Copy the code

Based on the above configuration, everything between LNG ‘and’ is identified as the source.js statement specified by contentName.


Lexical highlighting is essentially breaking the original text into multiple sequences of tokens with classes according to the above rules, and then adapting different styles according to the types of tokens. TextMate provides a set of functional structures based on the token-type field scope configuration style, for example:

    "tokenColors": [{"scope": "tecvan"."settings": {
                "foreground": "#eee"}}, {"scope": "tecvan.lng.prefix"."settings": {
                "foreground": "#F44747"}}, {"scope": "tecvan.lng.name"."settings": {
                "foreground": "#007acc",}}]}Copy the code

In the example, the scope property supports a matching pattern called scope Selectors, which, like CSS Selectors, supports:

  • Element selection, for examplescope = tecvan.lng.prefixTo be able to matchtecvan.lng.prefixTokens of type; specialscope = tecvanTo be able to matchtecvan.lngtecvan.lng.prefixToken of the subtype
  • Offspring selection, for examplescope = text.html source.jsUsed to match JavaScript code in HTML documents
  • Group selection, for examplescope = string, commentUsed to match strings or remarks

Plug-in developers can customize scope or choose to reuse many built-in scopes of TextMate, including comment, constant, Entity, invalid, keyword, etc. For a complete list, please refer to the official website.

The Settings property is used to set the presentation style of the token, which can be foreground, background, bold, ITALic, underline, etc.

Instance analysis

After seeing the principle, let’s disassemble a practical case: github.com/mrmlnc/vsco… Json5 is a JSON extension protocol designed to make it easier to write and maintain manually. It supports features such as remarks, single quotes, and hexadecimal digits. These extension features require vscode-json5 plug-in to implement highlighting effects:

In the image above, the left side shows the effect without VScode-Json5 being enabled, and the right side shows the effect after vscode-Json5 is enabled.

Vscode – JSON5 plug-in source is relatively simple, two key points:

  • inpackage.jsonFile in which the plug-in is declaredcontributesProperty, which can be understood as an entry to the plug-in:
  "contributes": {
    // Language configuration
    "languages": [{
      "id": "json5"."aliases": ["JSON5"."json5"]."extensions": [".json5"]."configuration": "./json5.configuration.json"}].// Syntax configuration
    "grammars": [{
      "language": "json5"."scopeName": "source.json5"."path": "./syntaxes/json5.json"}}]Copy the code
  • In the syntax configuration file./syntaxes/json5.jsonIn accordance with the requirements of TextMate define Language Rule:
    "scopeName": "source.json5"."fileTypes": ["json5"]."name": "JSON5"."patterns": [{"include": "#array" },
        { "include": "#constant" }
        // ...]."repository": {
        "array": {
            "begin": "\ \ ["."beginCaptures": {
                "0": { "name": "punctuation.definition.array.begin.json5"}},"end": "\ \]"."endCaptures": {
                "0": { "name": "punctuation.definition.array.end.json5"}},"name": "meta.structure.array.json5"
            // ...
        "constant": {
            "match": "\\b(? :true|false|null|Infinity|NaN)\\b"."name": "constant.language.json5"
        // ...}}Copy the code

OK, it’s over. It’s over. Simple as that.

A debugging tool

Vscode has a built-in scope Inspect tool, which is used to debug the token and scope information detected by TextMate. When used, you only need to focus the editor cursor on a specific token. Run the Developer: Inspect Editor Tokens and Scopes command and press Enter:

After the command is executed, you can see the language, scope, style and other information of the word segmentation token.

Programming language extensions

TextMate is essentially a regular-based static lexical analyzer. Its advantages are standardized access, low cost and high running efficiency. Its disadvantages are that static code analysis is difficult to implement some context-specific IDE functions, such as the following code:

Note that the function parameter languageModes in the first line of the code is the same entity as the languageModes in the second line of the code but does not implement the same style, so there is no visual linkage.

To this end, vscode provides three more powerful and complex language feature extensions in addition to the TextMate engine:

  • useDocumentSemanticTokensProviderImplement programmable semantic analysis
  • usevscode.languages.*The interface monitors all kinds of programming behavior events and implements semantic analysis at specific time nodes
  • According to the Language Server Protocol, a complete set of Language analysis Server is implemented

The language feature interface is more flexible than the declarative lexical highlighting described above, enabling advanced functions such as error diagnosis, candidate words, intelligent hints, and defining jumps.


  • Code.visualstudio.com/api/languag…

  • Code.visualstudio.com/api/languag…

  • Code.visualstudio.com/api/languag…

DocumentSemanticTokensProvider participle

Introduction to the

Sematic Tokens Provider is an object protocol built into vscode. It scans the contents of the code file itself and returns a sequence of semantic Tokens as an integer array. Tells vscode what type of token is in the file in which line, column, and interval.

TextMate scans are engine driven and regular matching is performed line by line. In the Sematic Tokens Provider scenario, the scan rules and matching rules are implemented by the plug-in developers themselves, which increases flexibility but costs more.

Implementation, Sematic Tokens Provider to vscode. DocumentSemanticTokensProvider interface definition, developers can according to the need to implement two methods:

  • provideDocumentSemanticTokens: Fully analyze code file semantics
  • provideDocumentSemanticTokensEdits: Incremental analysis is editing the semantics of the module

Let’s look at a complete example:

import * as vscode from 'vscode';

const tokenTypes = ['class'.'interface'.'enum'.'function'.'variable'];
const tokenModifiers = ['declaration'.'documentation'];
const legend = new vscode.SemanticTokensLegend(tokenTypes, tokenModifiers);

const provider: vscode.DocumentSemanticTokensProvider = {
    document: vscode.TextDocument
  ): vscode.ProviderResult<vscode.SemanticTokens> {
    const tokensBuilder = new vscode.SemanticTokensBuilder(legend);
      new vscode.Range(new vscode.Position(0.3), new vscode.Position(0.8)),
      [tokenModifiers[0]]);returntokensBuilder.build(); }};const selector = { language: 'javascript'.scheme: 'file' };

vscode.languages.registerDocumentSemanticTokensProvider(selector, provider, legend);
Copy the code

Tokensbuilder.build () : tokensBuilder.build() : tokensBuilder.build()

Output structure

ProvideDocumentSemanticTokens function returns an integer array, array items according to a set of five respectively:

  • The first5 * iBit: Indicates the offset of the line where the token resides relative to the previous token
  • The first5 * i + 1Bit, the offset of the token column relative to the previous token
  • The first5 * i + 2Bit, token length
  • The first5 * i + 3Bit: indicates the type value of the token
  • The first5 * i + 4Bit: The modifier value of the token

We need to understand that this is a position-dependent array of integers, in which every 5 items describe the location and type of a token. The token position is composed of three digits: row, column, and length. In order to compress the size of the data, VScode deliberately designed the form of relative displacement. For example, for code like this:

const name as
Copy the code

Const, name, as; const, name, as;

// Corresponding to the first token: const
0.0.5, x, x,
// The second token: name
0.6.4, x, x,
// The third token is as
0.5.2, x, x
Copy the code

Note that it is described relative to the position of the previous token. For example, the semantics of the five digits corresponding to the AS character are as follows: The offset from the previous token is 0 lines, 5 columns, the length is 2, and the type is XX.

The remaining 5 * I + 3 and 5 * I + 4 bits describe the token type and modifier respectively, where Type indicates the token type, such as comment, class, function, namespace, and so on. Modifier is a type-based modifier that can be roughly understood as a subtype, such as an abstract modifier for class or an export of defaultLibrary from the standard library.

The specific value of Type and Modifier should be defined by the developer, for example, in the preceding example:

const tokenTypes = ['class'.'interface'.'enum'.'function'.'variable'];
const tokenModifiers = ['declaration'.'documentation'];
const legend = new vscode.SemanticTokensLegend(tokenTypes, tokenModifiers);

// ...

vscode.languages.registerDocumentSemanticTokensProvider(selector, provider, legend);
Copy the code

SemanticTokensLegend class to construct the internal representation of The Legend object for Type and Modifier. After using vscode. Languages. Registered with the provider to the vscode registerDocumentSemanticTokensProvider interface.

Semantic analysis

In the above example, the main function of provider is to traverse the contents of the analysis file and return an integer array that meets the above rules. Vscode does not limit the specific analysis method, but provides the tool SemanticTokensBuilder for constructing the token description array. For example, in the above example:

const provider: vscode.DocumentSemanticTokensProvider = {
    document: vscode.TextDocument
  ): vscode.ProviderResult<vscode.SemanticTokens> {
    const tokensBuilder = new vscode.SemanticTokensBuilder(legend);
      new vscode.Range(new vscode.Position(0.3), new vscode.Position(0.8)),
      [tokenModifiers[0]]);returntokensBuilder.build(); }};Copy the code

Select * from SemanticTokensBuilder; select * from SemanticTokensBuilder; select * from SemanticTokensBuilder;

All characters other than this recognized token are considered unrecognizable.


Essence, DocumentSemanticTokensProvider just provide a rough IOC interface, developers can do is limited, so now most plug-ins are not using this scheme, the reader to understand, don’t have to dig in.

Language API

Introduction to the

The vscode.languages.* series of apis provide language extension capabilities that may be more in line with the mindset of front-end developers. Vscode.languages. * hosts a series of user interaction processing and classification logic, and is open in the form of an event interface. Plug-in developers only need to listen to these events, infer language characteristics according to the parameters, and return results according to the rules.

The Vscode Language API provides a number of event interfaces, such as:

  • Tip registerCompletionItemProvider: provides code completion

  • RegisterHoverProvider: Triggered when the cursor lands on a token

  • Tip registerSignatureHelpProvider: provide function signature

Complete list, please refer to code.visualstudio.com/api/languag… The article.

Hover example

Hover is implemented in two steps. First, declare the Hover feature in package.json:

{..."main": "out/extensions.js"."capabilities" : {
        "hoverProvider" : "true". }}Copy the code

After that, we need to register the hover callback by calling registerHoverProvider in the Activate function:

export function activate(ctx: vscode.ExtensionContext) :void {... vscode.languages.registerHoverProvider('language name', {
        provideHover(document, position, token) {
            return { contents: ['aweome tecvan']}; }}); . }Copy the code

Running results:

Other features and functions are written in a similar way. Interested students are advised to consult the official website.

Language Server Protocol

Introduction to the

The above code highlighting method based on language extension has a similar problem: it is difficult to reuse between editors. For the same language, it is necessary to repeatedly write support plug-ins with similar functions according to the editor environment and language. Therefore, for n languages and editors in M, the development cost is N * M.

In order to solve this problem, Microsoft proposed a standard Protocol called Language Server Protocol. Instead of communicating directly with the editor, the Language function plug-in can communicate with the editor through LSP.

Adding an LSP layer brings two benefits:

  • The development language and environment of the LSP layer are decoupled from the host environment provided by the IDE
  • The core functionality of the language plug-in needs to be written once and can be reused into an IDE that supports the LSP protocol

Although LSP is almost the same as the above Language API, the development efficiency of the plug-in is greatly improved by virtue of these two advantages. At present, many vscode Language plug-ins have been migrated to LSP implementation. Including vetur, eslint, Python for VSCode and other well-known plug-ins.

The LSP architecture in Vscode consists of two parts:

  • Language Client: a standard vscode plug-in that interacts with the vscode environment, such as hover events, which are first passed to the Client and then to the server behind it
  • Language Server: the core implementation of Language features. It communicates with the Language Client through LSP. Note that the Server instance runs as a separate process

For example, LSP is an architecturally optimized Language API. The functions implemented by a single provider function are decomposed into a cross-language architecture between Client and Server. The Client interacts with vscode and forwards requests. The Server performs code analysis actions and provides highlighting, completion, hints and other functions, as shown below:

A simple example

LSP is a little bit more complicated, so I recommend you to scroll down the official vscode example to learn more.

git clone https://github.com/microsoft/vscode-extension-samples.git
cd vscode-extension-samples/lsp-sample
yarn compile
code .
Copy the code

The main code files of VScode-extension-samples /lsp-sample are as follows:

. ├ ─ ─ the client// Language Client│ ├─ SRC │ ├─ └// Language Client entry file├ ─ ─ package. Json └ ─ ─ server// Language Server└ ─ ─ the SRC └ ─ ─ for server ts// Language Server entry file
Copy the code

There are a few key points in the sample code:

  1. inpackage.jsonDeclare activation criteria and plug-in entry in
  2. Writing entry filesclient/src/extension.tsTo start the LSP service
  3. Compiling the LSP service isserver/src/server.tsTo implement LSP

Logically, vscode determines the activation conditions according to package.json configuration when loading the plug-in, and then loads and runs the plug-in entry to start the LSP server. After the plug-in is started, subsequent user interactions in VSCode will trigger the client of the plug-in in standard events such as hover, completion and Signature Help, and then the client will forward the plug-in to the server layer in accordance with LSP protocol.

Let’s break down the three modules in detail.

Entrance to the configuration

Package. json in vscode-extension-samples/lsp-sample has two key configurations:

    "activationEvents": [
        "onLanguage:plaintext"]."main": "./client/out/extension",}Copy the code

Among them:

  • activationEvents: declares the activation conditions of the plug-in, in the codeonLanguage:plaintextActivate when opening TXT file
  • main: Entry file for the plug-in

The Client sample

The Client entry code in vscode-extension-samples/lsp-sample is as follows:

export function activate(context: ExtensionContext) {
    // Server configuration information
    const serverOptions: ServerOptions = {
        run: {
            // Server module entry file
            module: context.asAbsolutePath(
                path.join('server'.'out'.'server.js')),// Supports STDIO, IPC, PIPE, socket
            transport: TransportKind.ipc

    / / Client configuration
    const clientOptions: LanguageClientOptions = {
        // Similar to activationEvents in the packes. json file
        // The activation conditions of the plug-in
        documentSelector: [{ scheme: 'file'.language: 'plaintext'}].// ...

    // Create proxy objects using Server and Client configurations
    const client = new LanguageClient(
        'languageServerExample'.'Language Server Example',

Copy the code

The code thread is clear, starting with defining Server and Client configuration objects, and then creating and starting LanguageClient instances. As you can see from the examples, the Client layer can be very thin. In the Node environment, most of the forwarding logic is wrapped in the LanguageClient class, so developers don’t have to worry about the details.

Server sample

The Server code in vscode-extension-samples/lsp-sample realizes error diagnosis and code completion, which is a little complicated as a learning sample, so I only extracted the code of error diagnosis:

// All Server layer communications are implemented using the Connection object created by createConnection
const connection = createConnection(ProposedFeatures.all);

// Document object manager, provides document operation, listening interface
// Document objects that match the Client activation rules are automatically added to the Documents object
const documents: TextDocuments<TextDocument> = new TextDocuments(TextDocument);

// Listen for document content change events
documents.onDidChangeContent(change= > {

/ / check
async function validateTextDocument(textDocument: TextDocument) :Promise<void> {
    const text = textDocument.getText();
    // Matches all uppercase words
    const pattern = /\b[A-Z]{2,}\b/g;
    let m: RegExpExecArray | null;

    // If a word contains all uppercase characters, an error is reported
    const diagnostics: Diagnostic[] = [];
    while ((m = pattern.exec(text))) {
        const diagnostic: Diagnostic = {
            severity: DiagnosticSeverity.Warning,
            range: {
                start: textDocument.positionAt(m.index),
                end: textDocument.positionAt(m.index + m[0].length)
            message: `${m[0]} is all uppercase.`.source: 'ex'

    // Send error diagnosis information
    // vscode automatically completes the error rendering
    connection.sendDiagnostics({ uri: textDocument.uri, diagnostics });
Copy the code

The LSP Server code flow is as follows:

  • callcreateConnectionEstablish a communication link with vscode main process, and all subsequent information interaction is based on the connection object.
  • createdocumentsObject and listens for document events as needed, as in the example aboveonDidChangeContent
  • Analyze code content in event callbacks and return error diagnostics based on language rules, such as using re in the example to determine if words are all uppercase and if soconnection.sendDiagnosticsThe interface sent an error message. Procedure

Operation effect:


As shown in the sample code, communication between LSP clients and servers has been encapsulated in LanguageClient, Connection and other objects, plug-in developers do not need to care about the underlying implementation details. You do not need to understand the LSP protocol to implement simple code highlighting based on the interfaces and events exposed by these objects.


Vscode provides a variety of language extension interfaces in the way of plug-ins, including declarative and programmatic. In practical projects, these two technologies are usually mixed, and the declarative interface based on TextMate is used to quickly identify the morphology in the code. Programming interfaces such as LSP replenishment provide advanced functions such as error notification, code completion, and jump definition.

During this period of time, I have seen a lot of open source vscode plug-ins, among which the study of Vetur plug-ins provided by Vue is a typical case in this respect, and the learning value is very high. It is suggested that readers who are interested in this aspect can go to analyze and learn the writing method of vscode language extension plug-ins.

Welcome to “Byte front end ByteFE”

Resume mailing address: [email protected]