Just to be clear, this kind of long series of big blogs can only be as deep as possible into each line of source code, and some of the code I don’t want to dig into is explained in a comment. In addition, because the local arrangement is better, the blog is written freely.

The whole Compile process is now only seen before ASMJS, simple several times, most methods do not click in to see, it is too complicated. The last article ended with an entry point to the AST, a common approach to namespace Parsing, as follows.

bool ParseProgram(ParseInfo* info, Isolate* isolate) {
  // ...
  /** * Generates a Parser instance * calls an internal method to initiate conversion */
  Parser parser(info);
  FunctionLiteral* result = nullptr;
  /** * after converting AST, assign the result to ParseInfo literal_ */
  result = parser.ParseProgram(isolate, info);
  info->set_literal(result);
  // ...
  return(result ! =nullptr);
}Copy the code
That’s all the core code you need to care about. It’s very simple. The Parser object has a lot of initialization properties that I won’t show here.

Let’s move on to the second core method, ParseProgram.

FunctionLiteral* Parser::ParseProgram(Isolate* isolate, ParseInfo* info) {
  // ...
  /** * Scanner_ is a private property of the Parser class
  scanner_.Initialize();
  FunctionLiteral* result = DoParseProgram(isolate, info);

  // ...
  return result;
}Copy the code
Again, there are only two lines of code to care about, where the first step is to initiate the scanner’s initialization and the second step is to start full parsing.

Scanner includes Scanner and scanner-character-strams. Stream is the source String that has been processed preliminatively and must be converted before parsing. The process is described in the previously omitted code. Here is a glimpse of the transformation process.

bool ParseProgram(ParseInfo* info, Isolate* isolate) {
  // ...
  /** * Info ->script() returns the description of the String. The source is a String of Local
      
        *. ScannerStream is the scanner-character-streams header class */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */
      
  Handle<String> source(String::cast(info->script()->source()), isolate);
  std: :unique_ptr<Utf16CharacterStream> stream(ScannerStream::For(isolate, source));
  info->set_character_stream(std::move(stream));
  // ...
}

/** * There are four types of ScannerStream::For(isolate, data, 0, data->length()); * /
Utf16CharacterStream* ScannerStream::For(Isolate* isolate, Handle<String> data, int start_pos, int end_pos) {
  size_t start_offset = 0;
  // ...
  if (data->IsSeqOneByteString()) {
    return new BufferedCharacterStream<OnHeapStream>(
        static_cast<size_t>(start_pos), Handle<SeqOneByteString>::cast(data),
        start_offset, static_cast<size_t>(end_pos)); }}Copy the code
The normal string is usually OneByteString, so I won’t go into detail here. Finally, a special Stream class is returned with properties that record the length of the string, the current parsing progress, the parsing start and end tags, and so on.

After converting the string, you can use Scanner for step-by-step parsing, but before you do that, you need a brief understanding of the Scanner class, as follows.

/** * Scanner class * and Utf16CharacterStream a file */
class V8_EXPORT_PRIVATE Scanner {
  public:
    // Returns the token type of next_
    Token::Value peek(a) const { return next().token; }
    // Return the position of current_
    const Location& location(a) const { return current().location; }
  private:
    // The current character's Unicode encoding -1 indicates the end (typedef int32_t uc32)
    uc32 c0_;
    TokenDesc* current_;    // desc for current token (as returned by Next())
    TokenDesc* next_;       // desc for next token (one token look-ahead)
    TokenDesc* next_next_;  // desc for the token after next (after PeakAhead())
    // The type converted from Handle
      
        is responsible for the actual class that performs the parsing
      
    Utf16CharacterStream* const source_;
}Copy the code
The Scanner uses a few simple properties and methods. The Scanner has three cursor properties that iterate over strings: current_, next_, and next_next_. Source_ is the transform Stream class, and all the parsing is actually a method that calls this property. The two structures TokenDesc and Location are also very important. One is responsible for lexical description and the other is responsible for recording lexical Location information, as follows.

/** * lexical structures * each TokenDesc represents a single paragraph of lexical */
struct TokenDesc {
  For example, "'Hello' + 'World'" 'Hello' in sample will be interpreted as TOKEN::STRING location is {0, 7} */
  Location location = {0.0};
  /** * string morphology is related to */
  LiteralBuffer literal_chars;
  LiteralBuffer raw_literal_chars;
  /** * Enumeration of lexical types * e.g. '(' is TOKEN::LPAREN '===' is TOKEN::EQ_STRICT * all types visible toke.h */
  Token::Value token = Token::UNINITIALIZED;
  MessageTemplate invalid_template_escape_message = MessageTemplate::kNone;
  Location invalid_template_escape_location;
  / / small integers
  uint32_t smi_value_ = 0;
  bool after_line_terminator = false;
}Copy the code
With this structure and a few methods, you can completely step the source string into an abstract syntax tree. However, the actual conversion process is very complicated and has many branches, which will be explored later.