The interpreter article

The Python virtual machine essentially simulates CPU execution, and the corresponding function stack frame implements python bytecode execution. When the virtual machine actually executes, it faces not a PyCodeObject but a PypyFrameObject

typedef struct _frame{ PyObject_VAR_HEAD struct _frame *f_back; PyCodeObject *f_code; PyObject *f_builtins; PyObject *f_globals; PyObject *f_locals; PyObject **f_valuestack; PyObject **f_stacktop; . int f_lasti; int f_lineno; . PyObject * f_LOCALsPlus [1]; }Copy the code

The status of the runtime

Create PyFrameObject procedure

PyFrameObject *PyFrame_New(PyThreadState *tstate, PyCodeObject *code, PyObject *globals, PyObject *locals){
    PyFrameObject *f;
    Py_ssize_t extras, ncells, nfree, i;
    ncells = PyTuple_GET_SIZE(code->co_cellvals);
    nfrees = PyTuple_GET_SIZE(code->co_freevars);
    
    extras = code->code_stack + code->co_nlocals + ncells + nfrees;
    f = PyObject_GC_NewVar(PyFrameObject, &PyFrame_Type, extras);
    
    extras = code->con_nlocals + ncells + nfrees;
    f->f_valuestack = f->f_localsplus + extras;
    f->f_stacktop = f->f_valuestack;
    return f;
}
Copy the code

scope

LEGB is generally divided into two reference methods, attribute reference and name reference. The essence of attribute reference is to find the object referenced by a name in the name reference space.

The interpreter executes the bytecode core code

PyObject *PyEval_EvalFrameEx(PyFrameObject *f, int throwflag){ ... why = WHY_NOT; .for(;;) {... fast_next_opcode: f->f_lasti = INSTR_OFFSET(); Opcode = NEXTOP(); oparg = 0;if(HAS_ARG(opcode))
                oparg = NEXTARG();
        dispatch_opcode:
            switch(opcode){
                case NOP:
                    goto fast_next_opcode;
                caseLOAD_FAST: ... }}} why indicates the result of a bytecode instruction execution. Enum Why_Code {WHY_NOT = 0x0001 WHY_EXCEPTION = 0x0002 WHY_RERAISE = 0x0004 // An exception is triggered in finally WHY_RETURN = 0x0008 WHY_BREAK = 0x0010 WHY_CONTINUE = 0x0020 WHY_YIELD = 0x0040 }Copy the code

Thread process concept

Programs are classified as processes or threads in run mode. This introduces the notion that a PyFrameObject is essentially a process or thread (and of course, eventually thread related). Process is a resource management unit, thread is a program minimum running unit. Multiple threads share global information about the process address space.

The running environment of the thread must be saved when the Cpu switches tasks, similar to the process scheduling in the operating system. In Python, a thread has a PyThreadState object, and a PyInterpreterState object represents a process state object.

To save memory, CPython is implemented with multiple threads sharing the same Module object. Typically, a Python program has a Interpreter object that maintains multiple PythreadStates, which in turn use the bytecode interpreter.

typedef struct _is { struct _is *next; struct _ts *tstate_head; PyObject *modules; PyObject *sysdict; PyObject *builtins; . } PyInterpreterState typedef struct _ts { struct _ts *next; PyInterpreterState *interp; struct _frame *frame; int recursion_depth; . PyObject *dict; . long thread_id; } PyThreadState;Copy the code

When the Python virtual machine starts executing, the PyFrameObject is removed from the frame and set to the current execution environment, and the relationship is reset when a new stack frame is created.

Bytecode and PyFrameObject

For example, we often use assignment statements: I = 1, which corresponds to the bytecode: 0 LOAD_CONST 0 (1) 3 STORE_NAME 0 (I)

+ corresponding bytecode

BINARY_ADD
    w = POP();
    v = TOP();
    if(PyInt_CheckExact(v) && PyInt_CheckExact(w)){
        register long a, b, i;
        a = PyInt_AS_LONG(v);
        b = PyInt_AS_LONG(w);
        i = a+b;
        if((i^a) <0 && (i^b)<0)
            goto slow_add;
        x = PyInt_FromLong(i);
    }
    else if (PyString_CheckExact(v) && PyString_CheckExact(w)){
        x = string_concatenate(v, w, f, next_instr);
        goto skip_decref_vx;
    }
    else{
        slow_add:
            x = PyNumber_Add(v, w);
    }
    Py_DECREF(v);
skip_decref_vx:
    Py_DECREF(w);
    SET_TOP(x);
    break;
Copy the code

Implementation of logical statements

The logic of the if statement is relatively simple, similar to the jump series command in the assembly instruction, which finds the next bytecode instruction according to the absolute address or relative address. The for statement involves a special bytecode, SETUP_LOOP

typedef struct _frame{
    ....
    int f_iblock;
    PyTryBlock f_blockstack[CO_MAXBLOCKS];
}

void PyFrame_BlockSetup(PyFrameObject *f, int type, int handler, int level){
    PyTryBlock *b;
    b = &f->f_blockstack[f->f_iblock++];
    b->b_type = type;
    b->b_level = level;
    b->b_handler = handler
}
Copy the code

The function object

A function object is generated during the execution of bytecode and is essentially a structure containing a scope and code object.

typedef struct {
    PyObject_HEAD
    PyObject *func_code;
    PyObject *func_globals;
    PyObject *func_defaults;
    PyObject *func_closure;
    PyObject *func_doc;
    PyObject *func_name;
    PyObject *func_dict;
    PyObject *func_weakreflist;
    PyObject *func_module;
} PyFunctionObj
Copy the code

PyFunctionObject is generated dynamically by Python code at runtime, with static information stored in func_code. There is only one PyCodeObject for a piece of code, but there can be more than one PyFunctionObject.

MAKE_FUNCTION logic PyObject* PyFunction_New(PyObject *code, PyObject *globals){ PyFunctionObject *op = PyObject_GC_New(PyFunctionObject, &PyFunction_Type); static PyObject *__name__ = 0;if(op! =NULL){ ... op->func_code = code; op->func_globals = globals; op->func_name = ((PyCodeObject *)code)->co_name; consts = ((PyCodeObject *)code)->co_consts;if(PyTuple_size(consts)>=1) {
            doc = PyTuple_GetItem(consts, 0);
            if(! PyString_Check(doc) && ! PyUnicode_Check(doc)) doc = Py_None }elsedoc = Py_None ... }... Call_function static PyObject* call_function(PyObject ***pp_stack, int oparg){ int na = oparg & 0xff; int nk = (oparg>>8) & 0xff; int n = na + 2 * nk; PyObject **pfunc = (*pp_stack) -n-1; PyObject **pfunc = (*pp_stack) -n-1; PyObject *func = *pfunc; PyObject *x, *w;if(PyCFunction_Check(func) && nk == 0){
        ...
    } else {
        if(PyMethod_Check(func) && PyMethod_GET_SELF(func) ! = NULL){ ... }if(PyFunction_Check(func))
            x = fast_function(func, pp_stack, n, na, nk);
        elsex = do_call(func, pp_stack, na, nk); . }}...returnx; The process inside fast_function is basically creating a stack frame, The globals of the new stack frame are derived from the f_globals of the previous stack frame and local variables of f_locals are stored in the * F_LOCALsplus space of PyFrameObject. Operations are performed using load_fast and store_fastCopy the code

Create A bytecode for class A

class A(object):
    def __init__(self):
        c = 10
        print c

    def b(self):
	print "22"
Copy the code

Classes and types

In Python every object has a Type object, which can be retrieved from the object’s __class__. The type of instance is aclass object, and the type of aclass object is a metaclass object. The Python equivalent of this object is PyType_Type.

Each class object in Python has an IS_kind_of relationship directly or indirectly to the object object, which corresponds to PyObject_Type

For int, finding + corresponds as follows: Finding __add__ from PyInt_Type corresponds to finding nb_add of the function cluster.

<type ‘type’> corresponds to PyType_Type

Class ==> PyTypeObject The process of creating a class, setting the base class relationship for the class, and then setting the type for ittypeAnd then initializes the dict content int PyType_Ready(PyTypeObject *type) { PyObject *dict, *bases; PyTypeObject *base; Py_ssize_t i, n; // Assume that the parent class does not exist and is not PyBaseObject_Type. sotypeThe parent class of object Base =type->tp_base;
    if(base == NULL && type! = &PyBaseObject_Type){ base =type->tp_base = &PyBaseObject_Type; } // This condition determines whether the object is initializedif(base && Base ->tp_dict == NULL){PyType_Ready(base)} // Copy the parent class's type object assignmentif (type->ob_type == NULL && base ! = NULL){type->ob_type = base->ob_type; . }}Copy the code

This step is to populate the TP_dict

Populate __getitem__ with pylist_Type

Create bytecode analysis of the class

For example, a class is defined as follows

class A(object):

    def __init__(self):
        pass
    
    def f(self):
        pass
        
  1           0 LOAD_CONST               0 ('A')
              3 LOAD_NAME                0 (object)
              6 BUILD_TUPLE              1
              9 LOAD_CONST               1 (<code object A at 0x1073f73b0, file "test.py". line 1>) 12 MAKE_FUNCTION 0 15 CALL_FUNCTION 0 18 BUILD_CLASS 19 STORE_NAME 1 (A) 22 LOAD_CONST 2 (None) 25 RETURN_VALUECopy the code

The general logic is to load A,object to create A relational inheritance list, load the code object, create A function, and then call function. The contents of function are actually the contents of function code.

The Python engine executes the call_function function twice, creating two functionObjects.

PyObject * build_class(PyObject *methods, PyObject *bases, PyObject *name){
    PyObject *metaclass = NULL, *result, *base;
    if(PyDict_check(methods))
        metaclass = PyDict_GetItemString(methods, "__metaclass__");
    if(metaclass ! = NULL) Py_INCRE(metaclass);else if (PyTuple_checks(bases) && PyTuple_Get_Size(bases)>0){
        base = PyTuple_Get_ITEM(bases, 0);
        metaclass = PyObject_GetAttrString(bases, "__class__");
    }else{... } result = PyObject_CallFunctionObjArgs(metaclass, name, bases, methods, NULL); .returnresult; } check whether the metaclass is defined, if not, extract __class__ from the parent class first as metaclass.Copy the code

Class and instance’s __new__

Both Instance and class in Python have __dict__, so different objects can be set to different things, and different calss can be set to different things.