• Building your own Service Governance Framework from Scratch (PART 1)
  • Building your own Service Governance Framework from Scratch (PART 2)
  • PHP advanced road – 100 million level PV website architecture technical details and routines
  • PHP advanced road – 100 million pv site architecture actual performance squeeze

Note: THIS article was not written by me and was published in the TIPI ebook.

Before we get into thread safety, let’s review a few basics that will serve as a basis for our analysis.

Scope of a variable

In terms of scope, C language can define four different variables: global variables, static global variables, local variables, static local variables.

Let’s examine the different variables only from the perspective of function scope, assuming that all variables are not declared with the same name.

  • Global variable (int gVar;), declared outside the function. A global variable, shared by all functions, wherever this variable name appears refers to this variable.
  • Static global variables (static sgVar), in fact, all functions are shared, but this will have compiler limitations, is a function provided by the compiler.
  • Local variable (function/block insideint var;), do not share. The variables involved in multiple executions of a function are independent of each other, they are just different variables with the same name.
  • Local static variables (in functionsstatic int sVar;), the function is shared, and the variable involved in each execution of the function is the same variable.

All of the above scopes are defined from a functional point of view, allowing for all of the variables we share in single-threaded programming. Now let’s look at multithreading.

In multithreading, multiple threads share resources other than the function call stack. So the above scopes by definition become.

  • Global variables, shared by all functions, therefore shared by all threads, different variables that occur in different threads are the same variable.
  • Static global variables, shared by all functions, are also shared by all threads.
  • Local variables, which are not related to each execution of the function and, therefore, are not shared between threads.
  • Static local variables are shared between functions. Each execution of a function involves the same variable, and therefore, threads are shared.

Origin of thread-safe resource manager

In multithreaded systems, processes retain the property of resource ownership, and multiple concurrent execution streams execute threads running in the process. For example, the worker in Apache2, the master control process generates multiple sub-processes, and each sub-process contains a fixed number of threads, which independently process requests. Similarly, MinSpareThreads and MaxSpareThreads set a minimum and maximum number of idle threads in order not to spawn threads when a request comes in; MaxClients sets the total number of threads in all child processes. If the total number of threads in the existing child process cannot meet the load, the control process will spawn a new child process.

When PHP runs on a multithreaded server like the one above, PHP is in the multithreaded life cycle. In a certain period of time, there will be multiple threads in a process space. Multiple threads in the same process share global variables after module initialization. If the script is run as PHP in CLI mode, Multiple threads will try to read and write some common resources stored in the process memory space (for example, there will be more global variables outside the function after initialization of the common module of multiple threads).

These threads access the same memory address space at this time, when one thread to modify, will affect other threads, this sharing can improve the speed of some operations, but creates a larger coupling between multiple threads, and when multiple threads concurrently, can produce the common data consistency issues or concurrent common problems such as competition for resources, For example, the result of multiple runs is different from that of a single thread. Global variables and static variables are thread-safe if there are only reads and no writes on each thread, but this is unlikely.

To address concurrency, PHP introduced TSRM: Thread Safe Resource Manager. TRSM implementation code in PHP source code /TSRM directory, call everywhere, commonly known as TSRM layer. In general, the TSRM layer is only enabled at compile time when it is indicated to be needed (for example,Apache2+ Worker MPM, a thread-based MPM). Since Apache is multithreaded in Win32, this layer is always enabled in Win32.

The realization of the TSRM

Processes retain the property of resource ownership, threads make concurrent access, and the TSRM layer introduced in PHP focuses on access to shared resources, which are global variables shared between threads that exist in the process’s memory space. When PHP is in single-process mode, a variable becomes a global variable when it is declared outside any function.

First we define the following very important global variables (here global variables are shared by multiple threads).



/* The memory manager table */
static tsrm_tls_entry   **tsrm_tls_table=NULL;
static int              tsrm_tls_table_size;
static ts_rsrc_id       id_count;
 
/* The resource sizes table */
static tsrm_resource_type   *resource_types_table=NULL;
static int                  resource_types_table_size;Copy the code
  • **tsrm_tls_tableThread Safe Resource Manager Thread Local storage table, used to store threadstsrm_tls_entryA linked list.
  • tsrm_tls_table_sizeUsed to represent**tsrm_tls_tableThe size of the.
  • id_countThe ID generator, as a global variable resource, is globally unique and increasing.
  • *resource_types_tableUsed to store resources corresponding to global variables.
  • resource_types_table_sizesaid*resource_types_tableThe size of the.

Two key data structures, tsRM_TLS_entry and tsrm_resource_type, are involved.



typedef struct _tsrm_tls_entry tsrm_tls_entry;

struct _tsrm_tls_entry {
    void **storage;// Array of global variables for this node
    int count;// The number of global variables of this node
    THREAD_T thread_id;// The thread ID of this node
    tsrm_tls_entry *next;// Pointer to the next node
};

typedef struct {
    size_t size;// The size of the global variable structure to be defined
    ts_allocate_ctor ctor;// Constructor pointer to the defined global variable
    ts_allocate_dtor dtor;// Destructor pointer to the defined global variable
    int done;
} tsrm_resource_type;Copy the code

When a global variable is added, id_count increments by one (plus thread mutex). Then, the corresponding resource tsRM_resource_type is generated based on the memory, constructor, and destructor required by the global variable and stored in *resource_types_table. Then, the corresponding global variable is added to all tsRM_TLS_ENTRY nodes of each thread based on the resource.

With that in mind, let’s take a closer look at the initialization of the TSRM environment and the allocation of resource ids to understand the complete process.

Initialization of the TSRM environment

During module initialization, TSRM environment is initialized by calling tsRM_startup in each SAPI main function. The tsRM_startup function passes in two very important parameters, expected_THREADS, which indicates the expected number of threads, and Expected_resources, which indicates the expected number of resources. Different sapis have different initialization values, such as mod_php5 and CGI, which are one thread per resource.



TSRM_API int tsrm_startup(int expected_threads, int expected_resources, int debug_level, char *debug_filename)
{
    /* code... * /

    tsrm_tls_table_size = expected_threads; // The number of threads expected to be allocated during SAPI initialization is usually 1

    tsrm_tls_table = (tsrm_tls_entry **) calloc(tsrm_tls_table_size, sizeof(tsrm_tls_entry *));

    /* code... * /

    id_count=0;

    resource_types_table_size = expected_resources; // The size of the resource table pre-allocated during SAPI initialization is usually also 1

    resource_types_table = (tsrm_resource_type *) calloc(resource_types_table_size, sizeof(tsrm_resource_type));

    /* code... * /

    return 1;
}Copy the code

The three important things that are done here are simplified to initialize the TSRM_TLS_TABLE linked list, the resource_types_table array, and the ID_count. These three global variables are shared by all threads to achieve consistency in memory management between threads.

Allocation of resource ids

We know that initializing a global variable uses the ZEND_INIT_MODULE_GLOBALS macro (as illustrated in the array extension example below), when in fact the ts_allocate_id function is called to apply for a global variable in a multithreaded environment and return the ID of the allocated resource. Although the code is quite large, it is actually quite clear, with the following comments to illustrate:



TSRM_API ts_rsrc_id ts_allocate_id(ts_rsrc_id *rsrc_id, size_t size, ts_allocate_ctor ctor, ts_allocate_dtor dtor)
{
    int i;

    TSRM_ERROR((TSRM_ERROR_LEVEL_CORE, "Obtaining a new resource id, %d bytes".size));

    // Add multithreaded mutex
    tsrm_mutex_lock(tsmm_mutex);

    /* obtain a resource id */
    *rsrc_id = TSRM_SHUFFLE_RSRC_ID(id_count++); // global static variable id_count incremented by 1
    TSRM_ERROR((TSRM_ERROR_LEVEL_CORE, "Obtained resource id %d", *rsrc_id));

    /* store the new resource type in the resource sizes table */
    // Since resource_types_table_size has an initial value (Expected_resources), the memory is not necessarily expanded each time
    if (resource_types_table_size < id_count) {
        resource_types_table = (tsrm_resource_type *) realloc(resource_types_table, sizeof(tsrm_resource_type)*id_count);
        if(! resource_types_table) { tsrm_mutex_unlock(tsmm_mutex); TSRM_ERROR((TSRM_ERROR_LEVEL_ERROR,"Unable to allocate storage for resource"));
            *rsrc_id = 0;
            return 0;
        }
        resource_types_table_size = id_count;
    }

    // Store the size, constructor, and destructor of the global variable structure into the tsrm_resource_type array resource_types_table
    resource_types_table[TSRM_UNSHUFFLE_RSRC_ID(*rsrc_id)].size = size;
    resource_types_table[TSRM_UNSHUFFLE_RSRC_ID(*rsrc_id)].ctor = ctor;
    resource_types_table[TSRM_UNSHUFFLE_RSRC_ID(*rsrc_id)].dtor = dtor;
    resource_types_table[TSRM_UNSHUFFLE_RSRC_ID(*rsrc_id)].done = 0;

    /* enlarge the arrays for the already active threads */
    // The PHP kernel then iterates through all threads for each thread's tsRM_TLs_entry
    for (i=0; i<tsrm_tls_table_size; i++) {
        tsrm_tls_entry *p = tsrm_tls_table[i];

        while (p) {
            if (p->count < id_count) {
                int j;

                p->storage = (void *) realloc(p->storage, sizeof(void *)*id_count);
                for (j=p->count; j<id_count; j++) {
                    // Allocate the required memory space for global variables in this thread
                    p->storage[j] = (void *) malloc(resource_types_table[j].size);
                    if (resource_types_table[j].ctor) {
                        P ->storage[j]; p->storage[j];
                        // The second parameter of ts_allocate_ctor is not used in the whole project
                        resource_types_table[j].ctor(p->storage[j], &p->storage);
                    }
                }
                p->count = id_count;
            }
            p = p->next; }}// Remove thread mutex
    tsrm_mutex_unlock(tsmm_mutex);

    TSRM_ERROR((TSRM_ERROR_LEVEL_CORE, "Successfully allocated new resource id %d", *rsrc_id));
    return *rsrc_id;
}Copy the code

When assigning a global resource ID via ts_allocate_id, the PHP kernel first assigns a mutex to ensure that the generated resource ID is unique. The purpose of this lock is to serialize concurrent content in time, because concurrency is fundamentally a matter of time. After a resource ID is generated, a location is allocated for the current resource ID. Each resource is stored in a resource_types_table. When a new resource is allocated, A tsRM_resource_type is created. All tsrm_resource_type forms the tsrm_resource_table as an array with the subscript being the RESOURCE ID. In fact, we can think of tsrm_resource_table as a HASH table, where the key is the resource ID and the value is the tsrm_resource_type structure. (Any array can be considered a HASH table, if the key of the array makes sense.)

After allocating the resource ID, the PHP kernel then iterates through all threads to allocate the amount of memory needed for the thread global variable for each thread’s TSRM_TLS_entry. Here the size of each thread’s global variable is specified at the respective call (that is, the size of the global variable structure). Finally, the global variables stored in the address are initialized.

I drew a diagram to illustrate this

How the tsRM_TLS_table element is added and how the linked list is implemented. We’ll leave that on the table and talk about it later.

For each ts_allocate_id call, the PHP kernel iterates through all threads and allocates resources to each thread. Wouldn’t this call be repeated if it were done during the request processing phase of the PHP lifecycle?

PHP takes this situation into account, and the ts_allocate_id call is called when the module is initialized.

After TSRM is started, each extension’s module initialization method is iterated during module initialization. The extension’s global variables are declared at the beginning of the extension’s implementation code and initialized in the MINIT method. The ts_allocate_id (ts_allocate_id, ts_allocate_id, ts_allocate_id, ts_allocate_ID, ts_allocate_ID, ts_allocate_ID, ts_allocate_ID, ts_allocate_ID, ts_allocate_ID, ts_allocate_ID) TSRM allocates and registers in the memory pool, and then returns the resource ID to the extension.

Use of global variables

Taking a standard array extension as an example, the global variable for the current extension is first declared.



ZEND_DECLARE_MODULE_GLOBALS(array)Copy the code

The global variable initialization macro is then called to initialize the array during module initialization, such as allocating memory operations.



static void php_array_init_globals(zend_array_globals *array_globals)
{
    memset(array_globals, 0.sizeof(zend_array_globals));
}

/* code... * /

PHP_MINIT_FUNCTION(array) {{{/ * * /
{
    ZEND_INIT_MODULE_GLOBALS(array, php_array_init_globals, NULL);
    /* code... * /
}Copy the code

Both the declaration and initialization operations distinguish between ZTS and non-ZTs.



#ifdef ZTS

#define ZEND_DECLARE_MODULE_GLOBALS(module_name)                            \
    ts_rsrc_id module_name##_globals_id;

#define ZEND_INIT_MODULE_GLOBALS(module_name, globals_ctor, globals_dtor)    \
    ts_allocate_id(&module_name##_globals_id, sizeof(zend_##module_name##_globals), (ts_allocate_ctor) globals_ctor, (ts_allocate_dtor) globals_dtor);

#else

#define ZEND_DECLARE_MODULE_GLOBALS(module_name)                            \
    zend_##module_name##_globals module_name##_globals;

#define ZEND_INIT_MODULE_GLOBALS(module_name, globals_ctor, globals_dtor)    \
    globals_ctor(&module_name##_globals);

#endifCopy the code

For non-ZTS cases, declare variables directly and initialize variables; In the case of ZTS, the PHP kernel adds TSRM. Instead of declaring a global variable, it replaces ts_rsrc_id with ts_rsrc_id, and instead of initializing the variable, it calls ts_allocate_id to apply for a global variable for the current module in a multithreaded environment and returns the resource ID. The resource ID variable name consists of the module name and global_id.

To call the global variable of the current extension, use: ARRAYG(v), which is defined as:



#ifdef ZTS
#define ARRAYG(v) TSRMG(array_globals_id, zend_array_globals *, v)
#else
#define ARRAYG(v) (array_globals.v)
#endifCopy the code

If it is not ZTS, the attribute field of the global variable is called directly; if it is ZTS, the variable needs to be retrieved via TSRMG.

Definition of TSRMG:



#define TSRMG(id, type, element) (((type) (*((void ***) tsrm_ls))[TSRM_UNSHUFFLE_RSRC_ID(id)])->element)Copy the code

Without the parentheses, the TSRMG macro simply gets the global variable by resource ID from tsRM_ls and returns the corresponding variable’s property field.

So now the question is where does this tsRM_ls come from?

Initialization of tsRM_ls

Tsrm_ls is initialized with ts_resource(0). The actual final call to expand is ts_resource_ex(0,NULL). The following expands some macros for ts_resource_ex. The thread pthread is used as an example.



#define THREAD_HASH_OF(thr,ts)  (unsigned long)thr%(unsigned long)ts

static MUTEX_T tsmm_mutex;

void *ts_resource_ex(ts_rsrc_id id, THREAD_T *th_id)
{
    THREAD_T thread_id;
    int hash_value;
    tsrm_tls_entry *thread_resources;

    // tsrm_TLs_table has been initialized at tsrm_startup
    if(tsrm_tls_table) {
        // Th_id = NULL;
        if(! th_id) {The thread_resources pointer is null because pthread_setspecific has not been executed the first time
            thread_resources = pthread_getspecific(tls_key);

            if(thread_resources){
                TSRM_SAFE_RETURN_RSRC(thread_resources->storage, id, thread_resources->count);
            }

            thread_id = pthread_self();
        } else{ thread_id = *th_id; }}/ / lock
    pthread_mutex_lock(tsmm_mutex);

    Tsrm_tls_table (); // Run the tsrm_tls_table () hash to tsrm_tls_table ()
    hash_value = THREAD_HASH_OF(thread_id, tsrm_tls_table_size);
    Tsrm_tls_table_size = EXPECted_THREADS after SAPI calls tsrm_startup
    thread_resources = tsrm_tls_table[hash_value];

    if(! thread_resources) {// If not, a new assignment is made.
        allocate_new_resource(&tsrm_tls_table[hash_value], thread_id);
        // Execute the following else interval after allocation
        return ts_resource_ex(id, &thread_id);
    } else {
         do {
            // Match one by one along the linked list
            if (thread_resources->thread_id == thread_id) {
                break;
            }
            if (thread_resources->next) {
                thread_resources = thread_resources->next;
            } else {
                // If the end of the list is still not found, the new allocation is connected to the end of the list
                allocate_new_resource(&thread_resources->next, thread_id); return ts_resource_ex(id, &thread_id); }}while (thread_resources);
    }

    TSRM_SAFE_RETURN_RSRC(thread_resources->storage, id, thread_resources->count);

    / / unlock
    pthread_mutex_unlock(tsmm_mutex);

}Copy the code

Allocate_new_resource allocates memory for the new thread in the corresponding linked list and adds all global variables to its array of storage Pointers.



static void allocate_new_resource(tsrm_tls_entry **thread_resources_ptr, THREAD_T thread_id)
{
    int i;

    (*thread_resources_ptr) = (tsrm_tls_entry *) malloc(sizeof(tsrm_tls_entry));
    (*thread_resources_ptr)->storage = (void **) malloc(sizeof(void *)*id_count);
    (*thread_resources_ptr)->count = id_count;
    (*thread_resources_ptr)->thread_id = thread_id;
    (*thread_resources_ptr)->next = NULL;

    // Set the thread local storage variable. After setting this, go to ts_resource_ex and get pthread_setspecific(*thread_resources_ptr); if (tsrm_new_thread_begin_handler) { tsrm_new_thread_begin_handler(thread_id, &((*thread_resources_ptr)->storage)); } for (i=0; i
       
        storage[i] = NULL; } else { //
       ;>Add the resource_types_table resource for the new tsRM_TLS_entry node(*thread_resources_ptr)->storage[i] = (void *) malloc(resource_types_table[i].size);
            if (resource_types_table[i].ctor) {
                resource_types_table[i].ctor((*thread_resources_ptr)->storage[i], &(*thread_resources_ptr)->storage); }}}if (tsrm_new_thread_end_handler) {
        tsrm_new_thread_end_handler(thread_id, &((*thread_resources_ptr)->storage));
    }

    pthread_mutex_unlock(tsmm_mutex);
}Copy the code

Thread Local Storage now has a global variable tls_key that any Thread can use to change its value. This appears to be a global variable that can be used by all threads, and its value is stored separately in each thread. That’s what thread-local storage means. So how do you implement thread-local storage?

The tsrm_startup, ts_resource_ex, allocate_new_resource functions need to be combined with a comment to illustrate the example:



// Take pthread for example
// 1. The tls_key global variable is defined first
static pthread_key_t tls_key;

// 2. Then call pthread_key_create() on tsrm_startup to create the variable
pthread_key_create( &tls_key, 0 ); 

// 3. In allocate_new_resource, use tsrm_tls_set to store the *thread_resources_ptr pointer variable into the global tls_key variable
tsrm_tls_set(*thread_resources_ptr);// Expand to pthread_setspecific(*thread_resources_ptr);

// 4. In ts_resource_ex, use tsrm_tls_get() to get the *thread_resources_ptr set in this thread
// When multiple threads operate concurrently, they do not affect each other.
thread_resources = tsrm_tls_get(a);Copy the code

Now that you understand the tsRM_TLS_TABLE array and the creation of the linked list within it, look at the return macro called in the ts_resource_ex function



#define TSRM_SAFE_RETURN_RSRC(array, offset, range) \
    if (offset==0) {                                    \
        return &array;                                    \
    } else {                                            \
        return array[TSRM_UNSHUFFLE_RSRC_ID(offset)];    \
    }Copy the code

Return the address of the global variable in the thread’s storage array, offset based on the array of tsRM_TLS_ENTRY and storage. This is where you get the TSRMG macro definition for global variables in multithreading.

We’ll use this a lot when we write extensions:



#define TSRMLS_D void ***tsrm_ls   /* does not take a comma, usually when the only argument is */
#define TSRMLS_DC , TSRMLS_D       /* is also used for definition purposes, but the argument is preceded by another argument, so a comma */ is required
#define TSRMLS_C tsrm_ls
#define TSRMLS_CC , TSRMLS_CCopy the code

NOTICE: use ‘D’ as’ Define ‘, ‘C’ as’ C ‘, and ‘C’ as’ Call ‘. NOTICE: Use ‘D’ as’ Define ‘, ‘C’ as’ C ‘, and ‘C’ as’ Call ‘.

The above definitions are in ZTS mode. In non-ZTS mode, all the definitions are empty.

Add a hard AD PHP programmer skills pack should come to some hard goods!

Recently, Lao Tie opened a live broadcast. Welcome to join us!

  • PHP advanced road – 100 million level PV website architecture technical details and routines
  • PHP advanced road – 100 million pv site architecture actual performance squeeze
  • The path to PHP advancement – A quick entry into Java development for backend diversification