Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

preface

Compared to other programming languages like C, C++, and Java, PHP is weakly typed, meaning that when we use a variable, we don’t have to declare its type. This feature gives us a lot of traversal. And sometimes there are pitfalls. So, is it true that PHP has no data types?

Of course not. In the official PHP documentation, variables in PHP are classified into three categories: scalar data types, complex data types, and special data types.

  • Four scalar data types:

    • Boolean type (Bool)

    • Integer (Int)

    • Floating point

    • String type (String)

  • Two complex data types:

    • Array type (Array)

    • Object Type (Object)

  • Two special data types:

    • Resource Type

    • NULL

PHP is written in C, and PHP scripts are parsed into C code by the Zend engine before being executed. So how are PHP variables defined in C?

The answer is zval. No matter what type of variable, PHP source code uses a structure called zval. Zval can be thought of as a container for PHP variables in C code. It stores the value, type, and other information about the variable.

Let’s take a look at the basic structure of ZVal in PHP5

Zval basic structure in PHP5

zval

PHP 5.6.30zend. h: zval () {zval ();


struct _zval_struct {

    /* Variable information */

    zvalue_value value; /* Variable value */

    zend_uint refcount__gc; /* Reference count */

    zend_uchar type; /* Variable type */

    zend_uchar is_ref__gc; /* Whether to be referenced */

};

Copy the code

As you can see, in PHP source code, variables are represented by a structure with four members, representing the value of the variable, reference count, variable type, and whether or not the variable is referenced.

zvalue_value value

The variable values are defined as follows (using the union definition, which is characterized by the fact that only one member at a time is valid and the allocated memory matches the member that needs the most memory) :


typedef union _zvalue_value {

    long lval; /* Used for bool, integer, resource type */

    double dval; /* for floating point type */

    struct { /* for the string type */

        char *val;

        int len;

    } str;

    HashTable *ht; /* for array type */

    zend_object_value obj; /* For object type */

    zend_ast *ast; /* For constant expressions (PHP5.6 only) */

} zvalue_value;

Copy the code

Although PHP has eight data types, there are only five in the _zvalue_value union. Why?

This is because bool types, integers, and resource types are all stored in lval. Bool types are either 1 or 0 to indicate yes or no. Resource types store resource ids. PHP achieves the goal of reducing fields by reusing them.

How to store the eighth NULL type? NULL in PHP if all fields are set to 0 or NULL.

This gives us five fields representing eight data types.

Note also that the underlying array data structure in PHP is actually a hash table.

zend_uchar type

The zend_uchar type field stores the data type, and within Zend, corresponds to the macro defined below.


/* data types */

/* All data types <= IS_BOOL have their constructor/destructors skipped */

#define IS_NULL 0

#define IS_LONG 1

#define IS_DOUBLE 2

#define IS_BOOL 3

#define IS_ARRAY 4

#define IS_OBJECT 5

#define IS_STRING 6

#define IS_RESOURCE 7

#define IS_CONSTANT 8

#define IS_CONSTANT_AST 9

#define IS_CALLABLE 10

Copy the code

zend_uint refcount__gc

Refcount__gc stores the number of references to this zval.

zend_uchar is_ref__gc

Is_ref__gc indicates whether the zval is referenced.

Problems with Zval in PHP5

This section is based on an in-depth understanding of PHP7’s zval. To give you a better understanding of some of the problems with Zval in PHP5, I will copy some of the content from the zval article in PHP7.

1. Structure size

First let’s calculate the size of the zval structure.

Refcount__gc of int takes 4 bytes, is_ref__GC and type take 1 byte each.

Now let’s calculate the size of the value field. Union takes bytes equal to the largest element in it.

Among them:

  • STR: 8 + 4 = 12 bytes.

  • Lval: 4 bytes.

  • Dval: 8 bytes.

  • * HT: 8 bytes.

  • * AST: 8 bytes.

  • Obj:??

Zend_object_value: zend_object_value: zend_object_value: zend_object_value: zend_object_value: zend_object_value


typedef unsigned int zend_object_handle;

typedef struct _zend_object_value {

    zend_object_handle handle;

    const zend_object_handlers *handlers;

} zend_object_value;

Copy the code

As we can see, zend_object_value takes 12 bytes.

Due to memory alignment, union _zvalue_value takes 16 bytes.

Therefore, zval takes up a byte size of 16 + 4 + 1 + 1 = 22 bytes, which takes up 24 bytes due to memory alignment.

There is no need to take up so many bytes for an integer.

Therefore, we can optimize the structure, for example, we can optimize the zend_object_value field, which causes _zvalue_value to take 16 bytes.

We can move it out and use a pointer instead, because IS_OBJECT is not the most common type, after all.

2. The scalability

Zval is a structure in which each field has a clear meaning and no custom fields are reserved. As a result, in the ERA of PHP5, when many optimizations need to store some information related to ZVal, we have to use other structure mapping or patch after external packaging to expand Zval. For example, in PHP5.3, the GC was introduced specifically to deal with circular references, and it had to do the following comparison hacks:


/* The following macroses override macroses from zend_alloc.h */

#undef ALLOC_ZVAL

#define ALLOC_ZVAL(z) \

do { \

    (z) = (zval*)emalloc(sizeof(zval_gc_info)); \ GC_ZVAL_INIT(z); The \}while (0)

Copy the code

It hijacks zval’s allocation with zval_gc_info.


typedef struct _zval_gc_info {

    zval z;

    union {

        gc_root_buffer *buffered;

        struct _zval_gc_info *next;

    } u;

} zval_gc_info;

Copy the code

We then extend zval with _zval_gc_info, so we actually applied for a zval in PHP5 and actually allocated 32 bytes, but the GC only cares about IS_ARRAY and IS_OBJECT types, This results in a lot of wasted memory.

3. GC problems caused by passing references

PHP zval most are passed by value, the value of the copy-on-write but there are two exceptions, is the object and the resources, they will always be passed by reference, thus causing a problem, the object resources in addition to zval reference count of an accident, need a global reference counting, which ensures that the memory can be recycled, So in the PHP5 era, objects, for example, had two sets of reference counts, one for zval and one for obj itself:


typedef struct _zend_object_store_bucket {

    zend_bool destructor_called;

    zend_bool valid;

    union _store_bucket {

        struct _store_object {

            void *object;

            zend_objects_store_dtor_t dtor;

            zend_objects_free_object_storage_t free_storage;

            zend_objects_store_clone_t clone;

            const zend_object_handlers *handlers;

            zend_uint refcount;

            gc_root_buffer *buffered;

        } obj;

        struct {

            int next;

        } free_list;

    } bucket;

} zend_object_store_bucket;

Copy the code

In addition to the two sets of references mentioned above, if we want to get an object, we need to do the following:


EG(objects_store).object_buckets[Z_OBJ_HANDLE_P(z)].bucket.obj

Copy the code

After a long number of memory reads, the real object itself can be obtained. You can imagine the efficiency.

4. Copy and search strings

We know that a lot of the computation in PHP is string-oriented, but because reference counting works on zVal, we have no choice but to copy the string if we want to copy a zVal of type string. When we add a zval string as a key to an array, we have no choice but to copy the string. In PHP5.4, we introduced INTERNED strings, but that didn’t solve the problem at all.

In addition, a large number of structures in PHP are implemented based on Hashtable. The operation of adding, deleting, changing and querying Hashtable takes up a lot of CPU time, and the first thing to look up a string is its Hash value. Theoretically, we can calculate the Hash value of a string and save it. Avoid double-counting and so on.

5. References

In PHP5, write-time separation was used, but when combined with references there was a classic performance problem:



      

    function dummy($array) {}

    $array = range(1.100000);

    $b = &$array;

    dummy($array);

? >

Copy the code

When dummy is called, it is simply a pass line, but $array becomes a reference because array was assigned to array which was assigned to array which was assigned to array which was assigned to array which was assigned to array which was assigned to array which was assigned to array which was assigned to array which was assigned to b. To maximize procrastination, here’s a simple test:



      

    $array = range(1.100000);

    function dummy($array) {}

    $i = 0;

    $start = microtime(true);

    while($i++ < 100) {

        dummy($array);

    }

    printf("Used %sS\n", microtime(true) - $start);

    $b = &$array; // Notice here, suppose I accidentally refer the Array to a variable

    $i = 0;

    $start = microtime(true);

    while($i++ < 100) {

        dummy($array);

    }

    printf("Used %sS\n", microtime(true) - $start);

? >

Copy the code

We run the example under 5.6 and get the following result:


$ php-5.6/sapi/cli/php /tmp/1.php

Used 0.00045204162597656S

Used 4.2051479816437S

Copy the code

This can be a factor of 10,000, which means that if I accidentally change a variable into a reference (foreach as &$V, for example) in a large piece of code, it can trigger this problem and cause serious performance problems, but it can be very difficult to troubleshoot.

6. MAKE_STD_ZVAL/ALLOC_ZVAL (most important)

This is the most important one. Why is it important? Because this led to a big performance boost, we used to call MAKE_STD_ZVAL in the PHP5 era to allocate a zval on the heap and then operate on it, Finally, copy the zval value to return_value via RETURN_ZVAL, and then destroy the zval, as in pathinfo:

PHP_FUNCTION(pathinfo) { ..... MAKE_STD_ZVAL(tmp); array_init(tmp); .if (opt == PHP_PATHINFO_ALL) {

        RETURN_ZVAL(tmp, 0.1);

    } else{... }Copy the code

The TMP variable is a temporary variable, so why allocate it in heap memory? MAKE_STD_ZVAL/ALLOC_ZVAL was a very common use in PHP5, and if we could allocate this variable on the stack, it would be very beneficial, both memory allocation and cache-friendly.

Reference documentation

  • Zval in PHP5 and php7: Memory management, types, reference counting

  • Zval for PHP kernel