Learn about scopes from a compilation perspective

Previous articles covered the basics of compiling and how to refactor the script interpreter with Antlr (check it out if you’re interested). This article takes a closer look at scope and lifetime. The main solutions are as follows:

Upgrade variable management mechanism to achieve functional functions;
A scope mechanism is introduced to ensure that references to variables point to correct variable definitions.
Improve the variable storage mechanism by not simply throwing a variable and its value into a HashMap, but by managing its lifetime and reducing memory usage.

Scopes and lifetimes are more basic concepts in computer languages and can help us understand functions, blocks, closures, object-oriented, static members, local variables, and global variables. It helps us to solve some problems we may encounter in the learning process, such as:

What exactly is the mechanism of closures?
Why do you need both stack and heap mechanisms to manage memory? What’s the difference?
What is the difference between a static inner class and a normal inner class?

Let’s go through them one by one

Scope

Scope refers to the scope of a variable, function, class, etc. in computer language. The scope of a variable is the region of the variable defined in the program source code. Var var var var var var var var var var var var var var var var var var var var var var var var var var var var var var var var

/* scope.js tests scope. * /
var a = 1;
function fun1() {
  a = 2;
  b = 2;
  var a = 3;
  var b = a;
  console.log("inner", a, b);
}
var b = 4;
function testVar() {
  console.log("outter1", a, b);
  fun1();
  console.log("outter2", a, b);
  // Override global variables with local variables
  var c = 5;
  var d = 5;
  console.log("outter3", c, d);

  // block-level scope
  if (c > 0) {
    var c = 3; // Allow overwriting outside variables within a block
    console.log("block1", c, d);
  } else {
    var c = 4;
    console.log("block2", c, d);
  }
  console.log("outter4", c, d);
}
testVar();
Copy the code

The results in Chorme are as follows:

outter1 1 4
inner 3 3
outter2 1 4
outter3 5 5
block1 3 5
outter4 3 5
Copy the code

Let and VAR

Variables declared with var are promoted to the top of their scope and initialized with undefined.
Variables declared with let are promoted to the top of their scope without initializing their values. If you try to use a let variable before a declaration, a Reference Error will be reported.

Strict mode vs. non-strict mode, let, var, const

From the above example, we can get the following rule:

Variables can be large or small in scope. External variables can be accessed within a function, while local variables in a function can be accessed only locally.
The scope of a variable begins after it is declared.
In a function, when we declare a variable with var, we can declare a variable with the same name as the external variable, which overwrites the external variable.

In addition, C also has the concept of block scopes, which are statements surrounded by curly braces, followed by if and else. The characteristics of a block scope are similar to those of a function scope in that it can access external variables or override them with local variables.

In fact, different languages have different design mechanisms for this. For example, in this Java code, we use an if block and declare a variable c in the if, else, and outer parts:

/** * scope. Java * tests the Scope of Java */
public class ScopeTest{
 
    public static void main(String args[]){
        int a = 1;
        int b = 2;
 
        if (a > 0) {//int b = 3; // It is not allowed to declare variables with the same name as external variables
            int c = 3;
        }
        else{
            int c = 4;   // Allow the declaration of another c, each with its own scope
        }
        
        int c = 5;  // We can also declare a new c here}}Copy the code

As you can see, Java’s block scope is different from C’s block scope in that it does not allow variables in the block scope to override external variables. What about JavaScript, which is written much like C and Java? Take a look at the results of two separate JavaScript scoping tests:

Declare variables with var

var a = 5;
var b = 5;
console.log("1: a=%d b=%d", a, b);
 
if (a > 0) {
    a = 4;
    console.log("2: a=%d b=%d", a, b);
    var b = 3; // It looks like a new variable is declared, but it is actually the external variable referenced
    console.log("3: a=%d b=%d", a, b);
}
else {
    var b = 4;
    console.log("4: a=%d b=%d", a, b);
}
 
console.log("5: a=%d b=%d", a, b);
 
for (var b = 0; b< 2; b++){  // Can I declare a new variable for the for loop?
    console.log("6-%d: a=%d b=%d",b, a, b);
}
 
console.log("7: a=%d b=%d", a, b);
Copy the code

The results of

1: a=5 b=5
2: a=4 b=5
3: a=4 b=3
5: a=4 b=3
6-0: a=4 b=0
6-1: a=4 b=1
7: a=4 b=2
Copy the code

Declare variables with let

  let a = 5;
  let b = 5;
  console.log("1: a=%d b=%d", a, b);

  if (a > 0) {
    a = 4;
    console.log("2: a=%d b=%d", a, b); // Reference error, can not access external b variable
    let b = 3;  // Cannot do variable promotion
    console.log("3: a=%d b=%d", a, b);
  } else {
    let b = 4;
    console.log("4: a=%d b=%d", a, b); }... Same as aboveCopy the code

The results of

1: a=5 b=5
VM1151:7 Uncaught ReferenceError: Cannot access 'b' before initialization
    at <anonymous>:7:36
Copy the code

You can see in JavaScript there is no block scope for declaring variables with var. Switching to let declarations creates block scope, which is the main difference between let var declarations.

When you compare the scoping characteristics of the three languages, do you find different internal mechanisms for seemingly similar grammars? This difference is an example of semantic difference. Much of the content mentioned in this article is already in the domain of semantics, and the analysis of scope is one of the tasks of semantic analysis.

Now that we know what scope is, let’s take a look at its closely related lifetime.

Survival period (Extent)

The lifetime is the time period that a variable can access, that is, the time between allocating memory to it and reclaiming its memory.

From the previous examples, you can see that the lifetime of a variable is consistent with the scope. Out of scope, the lifetime is over and the memory occupied by the variable is freed. This is a standard feature of local variables, which are managed on a stack.

There are also cases where the lifetime of a variable is not syntactically scoped, such as memory that has been applied for in the heap and will persist after it is unscoped.

In this C sample code, fun returns a pointer to an integer. After the function exits, the local variable b disappears, and the memory occupied by the pointer (&b) is reclaimed, where &b is the address of b, which points to a small space on the stack, since B was requested from the stack. This small space in the stack holds an address that points to the memory requested in the heap. This chunk of memory, the space that actually holds the number 2, is not reclaimed, and we must manually reclaim it using the free() function.

// From Geek Time
#include <stdio.h>
#include <stdlib.h>
 
int * fun(){
    int * b = (int*)malloc(1*sizeof(int)); // Apply memory in the heap
    *b = 2;  // Assign 2 to this address
   
    return b;
}
 
int main(int argc, char **argv){
    int * p = fun();
    *p = 3;
 
    printf("after called fun: b=%lu *b=%d \n", (unsigned long)p, *p);
 
    free(p);
}
Copy the code

Let’s look at Java, where object instances are generated in the heap by default. The following example code returns a reference to an object from a method, and we can continue to modify the object’s contents based on this reference, proving that the object’s memory has not been freed:

// From Geek Time
public class Extent2{
 
    StringBuffer myMethod(){
        StringBuffer b = new StringBuffer(); // Generate object instances in the heap
        b.append("Hello ");
        System.out.println(System.identityHashCode(b)); // Prints the memory address
        return b;  // Returns an object reference, essentially a memory address
    }
 
    public static void main(String args[]){
        Extent2 extent2 = new Extent2();
        StringBuffer c = extent2.myMethod(); // Get the object reference
        System.out.println(c);
        c.append("World!");         // Modify the contents of memory
        System.out.println(c);
 
        // The same value as printed in myMethod()System.out.println(System.identityHashCode(c)); }}Copy the code

Because the memory used by Java objects is out of scope, there is no way to reclaim it automatically. So Java uses automatic memory management, also known as garbage collection.

For JavaScript variable storage, see my article: Portals

Why are scopes and lifetimes more fundamental concepts in computer languages? Because they correspond to the basic mechanics of memory management at runtime. Although the design features of each language are different, the run-time mechanisms are similar, such as the use of stacks and heaps for memory management.

How to implement scope and stack?

As mentioned in the previous article, when dealing with variable assignment, you can store the variable in a hash table and refer to it by its name, as follows:

   public class SimpleScript {
    private HashMap<String, Integer> variables = newHashMap<String, Integer>(); . }Copy the code

This is not possible if the variable has more than one scope. At this point, we need to design a data structure that distinguishes the scope of different variables. Analyzing the previous code, you can see that the scope is a tree structure:

Object-oriented languages are not quite the same, it’s not a tree, it’s a forest, there’s a tree for each class, so it doesn’t have global variables either.

Note: javascript is an object-based language because it does not provide many of the features associated with object-oriented languages such as abstraction, inheritance, overloading, etc. It’s just the unification of complex objects to form a fairly powerful object system.

The following scope is designed in Java syntax. There are three types of scope: Block scope, Function scope, and Class scope

// Variables, functions, classes, and blocks generated during compilation are called symbols
public abstract class Symbol {
    // The name of the symbol
    protected String name = null;
 
    // Scope
    protected Scope enclosingScope = null;
 
    // Visibility, such as public or private
    protected int visibility = 0;
 
    //Symbol Associated AST node
    protected ParserRuleContext ctx = null;
}
/ / scope
public abstract class Scope extends Symbol{
    // Members of this Scope, including variables, methods, classes, etc.
    protected List<Symbol> symbols = new LinkedList<Symbol>();
}
// block scope
public class BlockScope extends Scope{... }// Function scope
public class Function extends Scope implements FunctionType{... }// Class scope
public class Class extends Scope implements Type{... }Copy the code

When interpreting the AST, the tree structure of the scope needs to be established, and the process of analyzing the scope is part of the semantic analysis. That is, the AST is not something we can run immediately. Before we run it, we need to do semantic analysis, such as scoping, so that each variable can be referenced correctly, so that the program can be executed correctly.

Let’s take a look at the lifetime performance of each variable in the execution of scope.js:

Enter the program, global variables take effect one by one;
Enter testVar function, testVar function variables in order to take effect;
Enter the fun1 function, the order of variables in the fun1 function takes effect;
Exit fun1 function, fun1 variable invalidated;
Enter the if block, the order of variables in the if block takes effect;
Exit the if block. The variables in the if block are invalidated.
Exit testVar. The variables in testVar are invalid.
Exit program, global variable invalid.

Here is how the stack changes as it runs:

The process of entering and exiting scopes during code execution can be implemented using stacks. Each time a scope is entered, a data structure is pushed onto the Stack, which is called a Stack Frame. The frame holds the values of all local variables in the current scope. When you exit the scope, the frame is ejected and the variables in the frame are invalidated.

The stack mechanism makes efficient use of memory, and when variables go out of scope, they are no longer useful and can be discarded from memory. Stack and frame are represented by the following data structure, where the PlayObject holds the values of each variable through a HashMap:

/ / stack
private Stack<StackFrame> stack = new Stack<StackFrame>();
 
/ / the stack frame
public class StackFrame {
    // Scope corresponding to the frame
    Scope scope = null;
 
    //enclosingScope frame
    StackFrame parentFrame = null;
 
    // The actual place to store variables
    PlayObject object = null;
}
 
public class PlayObject {
    // Member variables
    protected Map<Variable, Object> fields = new HashMap<Variable, Object> (); }Copy the code

For now, it’s just a conceptual imitation of stack framing. When we implemented it in Java, PlayObject objects were stored in the heap. All Objects in Java were stored in the heap, and only the underlying data types, such as ints and object references, were stored in the stack. In JavaScript, the variable storage mechanism is similar, see: is a variable stored on a stack or heap in JavaScript?

Note that the stack structure is inconsistent with the Scope tree structure. In other words, the upper frame in the stack is not necessarily the parent node of Scope. To access the variable data in the upper Scope, follow the parentFrame of the frame. I show this in the figure above, when fun1 is called, there are three frames in the stack: Global frame, testVar and fun1, where parentFrame of testVar and parentFrame of fun1 are both global frame.

Implement block scope

Now that you have scoped and stacked, you can implement a lot of functionality, such as block scoped and local variables for if statements and for loops. For example, the visit method generates a frame for it, adds it to the stack, and then pops it from the stack:

BlockScope scope = (BlockScope) cr.node2Scope.get(ctx);  / / get the Scope
StackFrame frame = new StackFrame(scope);  // Create a stack frame
pushStack(frame);    // Add to stack.// The stack is ejected
stack.pop();
Copy the code

When we need to get the value of a variable in our code, we first look for it in the current frame. If you can’t find it, go to the frame corresponding to the upper scope:

StackFrame f = stack.peek();       // Get the frame at the top of the stack
PlayObject valueContainer = null;
while(f ! =null) {
    // Check whether the variable belongs to the current frame
    if (f.scope.containsSymbol(variable)){ 
        valueContainer = f.object;
        break;
    }
    Scope = scope = scope = scope = scope
    f = f.parentFrame;
}
Copy the code

If you run the following code, you can still declare another variable I after executing the for loop, independent of the I in the for loop, which proves that they do belong to different scopes:

  script = "int age = 44; for(int i = 0; i<10; i++) { age = age + 2; } int i = 8;";
Copy the code

Implement function function

Let’s start with the syntax for functions:

// Function declaration
functionDeclaration
    : typeTypeOrVoid? IDENTIFIER formalParameters ('[' '] ')*
      functionBody
    ;
/ / the function body
functionBody
    : block
    | '; '
    ;
// Type or void
typeTypeOrVoid
    : typeType
    | VOID
    ;
// All arguments to the function
formalParameters
    : '(' formalParameterList? ') '
    ;
// Parameter list
formalParameterList
    : formalParameter (', ' formalParameter)* (', ' lastFormalParameter)?
    | lastFormalParameter
    ;
// Single parameter
formalParameter
    : variableModifier* typeType variableDeclaratorId
    ;
// Variable number of arguments, the last argument
lastFormalParameter
    : variableModifier* typeType '... ' variableDeclaratorId
    ;
// Function call
functionCall
    : IDENTIFIER '(' expressionList? ') '
    | THIS '(' expressionList? ') '
    | SUPER '(' expressionList? ') '
    ;
Copy the code

There is one additional factor to consider in a function: arguments. Inside functions, parameter variables are used just like normal local variables, and at run time, they are stored in the frame just like local variables.

Design an object that represents the definition of a function, including a list of arguments and the type of the return value:

public class Function extends Scope implements FunctionType{
    / / parameters
    protected List<Variable> parameters = new LinkedList<Variable>();
 
    / / the return value
    protected Type returnType = null; . }Copy the code

When we call the function, we actually do three steps:

Create a stack frame;
Calculate the values of all parameters and put them into stack frame;
Execute the body of a function in a function declaration.

The relevant codes are as follows:

// The AST node declared by the function
FunctionDeclarationContext functionCode = (FunctionDeclarationContext) function.ctx;
 
// Create stack frame
functionObject = new FunctionObject(function);
StackFrame functionFrame = new StackFrame(functionObject);
 
// Calculates the value of the argument
List<Object> paramValues = new LinkedList<Object> ();if(ctx.expressionList() ! =null) {
    for (ExpressionContext exp : ctx.expressionList().expression()) {
        Object value = visitExpression(exp);
        if (value instanceofLValue) { value = ((LValue) value).getValue(); } paramValues.add(value); }}// Add variables to the frame according to the name of the parameter
if(functionCode.formalParameters().formalParameterList() ! =null) {
    for (int i = 0; i < functionCode.formalParameters().formalParameterList().formalParameter().size(); i++) { FormalParameterContext param = functionCode.formalParameters().formalParameterList().formalParameter(i); LValue lValue = (LValue) visitVariableDeclaratorId(param.variableDeclaratorId()); lValue.setValue(paramValues.get(i)); }}// Call the method body
rtn = visitFunctionDeclaration(functionCode);
 
// The stack is ejected
stack.pop();
Copy the code

In summary, block scope and function are realized, and two basic concepts of computer language are explored: scope and lifetime

The analysis of scope is a task of semantic analysis. Antlr can do a lot of lexical and grammatical analysis, but we need to do the semantic analysis ourselves.
The lifetime of a variable involves runtime memory management, which leads to the concept of stack frame and heap.

Learn about scopes from a compilation perspective

Scope

Survival period (Extent)

How to implement scope and stack?

Implement block scope

Implement function function

Related Posts

Java Knowledge Summary (2)

SVC reports “Unable to provide the page you requested due to an extension configuration issue. If the page is a script, add a handler. If the file should be downloaded, add a MIME map. “HTTP error 404.3 —

Practical Application of wechat Small Program Development (3)