Background knowledge

Instruction pipeline

The basic job of the CPU is to execute stored sequences of instructions, known as programs. The execution process of the program is actually the process of constantly taking out instructions, analyzing instructions and executing instructions.

On almost all von Neumann cpus, the work can be divided into five stages: fetch instruction, decode instruction, execute instruction, access number, and write back result.

In the architecture of modern processor, pipelining is used to process instructions. The instruction contains many stages, which can be disassembled. Each stage is handled by special hardware circuits and registers, and pipeline processing can be realized. Achieve higher CPU throughput, but may increase latency due to the additional overhead of pipelined processing itself.

CPU multi-level cache

In computer systems, the CPU Cache (CPU Cache for short) is a component used to reduce the average time it takes a processor to access memory. It is the second layer from the top down in a pyramid storage system, after the CPU register. It has far less capacity than memory, but speeds close to processor frequency.

When the processor makes a memory access request, it first looks to see if there is any request data in the cache. If there is (hit), the data is returned without accessing memory; If not, the data in memory is loaded into the cache before being returned to the processor.

Caching is effective primarily because the access to memory at runtime takes on the Locality feature. This Locality includes both Spatial Locality and Temporal Locality. Using this locality effectively, caching can achieve extremely high hit rates.

Problem is introduced into

atomic

Atomicity: An operation or operations are either all performed without interruption by any factor, or none at all.

Example method: {I ++ (I is the instance variable)}

Such a simple statement consists of three operations:

  • Read the value of variable I
  • Add one
  • Assign the new value to the variable I

Without additional control over the operation of instance variable I, multiple threads calling at the same time can overwrite and lose some updates.

In addition, if you consider the interaction between working memory and main memory, it can be broken down into the following operations:

  • Read Reads from main memory to working memory (optional)
  • Load variable copy assigned to working memory (optional)
  • Use passes the value of the working memory variable to the execution engine
  • The execution engine performs the plus-one operation
  • Assign assigns a value received from the execution engine to a variable in working memory
  • Store Passes the value of a variable in working memory to main memory (optional)
  • Write Writes the value of a variable in working memory to a variable in main memory (optional)

visibility

Visibility: When multiple threads access the same variable, if one thread changes the value of the variable, other threads can immediately see the changed value

The root cause of the visibility problem is that because of the cache, threads hold copies of the shared variable and cannot sense changes made to the shared variable by other threads, resulting in reading values that are not up to date.

while (flag) {//语句1
   doSomething();//语句2
}

flag = false;//语句3
Copy the code

Thread 1 determines the flag and executes statement 2 if the conditions are met. Thread 2flag is set to false, but thread 1 can’t sense it due to visibility issues and will keep looping through statement 2.

sequential

Sequentiality: that is, the order in which the program is executed is the order in which the code is executed

Because of compile reordering and instruction reordering, the order in which the program is actually executed is not necessarily the same as the order in the code, which can be problematic in multi-threaded situations.

if (inited == false) {	
   context = loadContext();   //语句1
   inited = true;             //语句2
}
doSomethingwithconfig(context); //语句3
Copy the code

Since statement 1 and statement 2 do not depend on each other, statement 1 and statement 2 May be executed in parallel or statement 2 May be executed before statement 1. If this code is executed simultaneously by two threads, thread 1 executes statement 2 but statement 1 is not finished yet, thread 2 determines that inited is true and executes statement 3. However, because the context has not been initialized, unknown exceptions will occur.

JMM memory model

The Java Virtual Machine specification defines the Java Memory Model (JMM) to mask the differences of Memory access between different hardware and operating systems, so that Java programs can achieve the same Memory access effect on various platforms (C/C++ and others directly use the Memory Model of physical machine and OS. This is especially important in the case of multiple threads.

Memory division

The main goal of the JMM is to define the access rules for variables in the program, the low-level details of storing and fetching variables from memory in the virtual machine. Variables here refer to shared variables that have contention problems, such as instance fields, static fields, array object elements, etc., excluding thread private local variables, method parameters, etc., because private variables do not have contention problems. You can think of the JMM as consisting of memory partitioning, variable access operations, and rules.

There are main memory and working memory. Each thread has its own working memory and they share main memory.

  • Main Memory stores the values of all shared variables.
  • Working Memory stores a duplicate of the main Memory values of shared variables used by the thread.

All read and write operations to shared variables are performed by the thread in its own working memory. It cannot read or write variables directly from the main memory.

Different threads cannot directly access variables in each other’s working memory, and the transfer of variable values between threads must be done through main memory.

This division is at a different level from the division of heap, stack, method area, etc. in Java memory region, and has nothing to do with them. Roughly speaking, main memory corresponds to the instance data portion of objects in the Java heap, and working memory corresponds to part of the stack. At a lower level, main memory corresponds to physical hardware memory, and working memory corresponds to registers and caches.

Memory interaction rules

Implementation details about the protocol for interaction between main memory and working memory, i.e. how a variable is copied from main memory to working memory and synchronized from working memory to main memory. The Java memory model defines eight atomic operations to accomplish:

  • Lock: Identifies a variable as being owned by a thread
  • Unclock: Releases a variable from its exclusive state so that it can be locked by other threads
  • Read: Transfers the value of a variable from main memory to working memory for subsequent load operations
  • Load: Puts the value of a variable from main memory into a copy of the variable in working memory from the read operation
  • Use: Passes the value of a variable in working memory to the execution engine, which uses it whenever the virtual machine reaches an instruction that uses the variable
  • Assign: Assigns a value received from the execution engine to a variable in working memory. This operation is used whenever the virtual machine accesses an instruction that assigns a value to the variable
  • Store: Passes the value of a variable in working memory to main memory for subsequent write operations
  • Write: Writes the value of the variable obtained from the working memory by the store operation to the variable in the main memory

Define rules for the use of atomic operations

  1. A thread is not allowed to synchronize data from working memory to main memory without any assign operation
  2. A new variable can only be created in main memory. It is not allowed to use an uninitialized (load or assign) variable in working memory. The use and store operations on a variable must be assigned and loaded.
  3. A variable can be locked by only one thread at a time. However, the lock operation can be repeated by the same thread several times. After the lock operation is performed several times, the variable can be unlocked only after the same number of UNLOCK operations are performed. Lock and unlock must come in pairs.
  4. If a lock operation is performed on a variable, the value of the variable will be emptied from working memory, and the load or assign operation will be re-performed to initialize the variable before the execution engine can use it.
  5. Unlock cannot be performed on a variable that has not been previously locked by a lock operation. It is also not allowed to unlock a variable that has been locked by another thread.
  6. Before you can unlock a variable, you must first synchronize it to main memory (store and write)

Copy variables from main memory to working memory requires read and load in sequence, and synchronize variables from working memory to main memory requires Store and write in sequence. Conclusion:

  • Read, load, and use must appear in pairs, but not consecutively. The same applies to assign, store, and write.
  • Variable creation and initialization: variables can only be created from main memory and must be initialized before they can be used, i.e. load/assign before use/store;
  • Lock a variable will clear the value of the variable in the working memory, must be initialized before use; Before unlock, you must synchronize variables back to main memory.
  • A variable can only be locked by one thread at a time. Unlock cannot be performed on a variable that has not been locked. A thread cannot unlock a variable that has been locked by another thread.

Special rules for long and double variables

The Java memory model requires all eight operations to be atomic, but for the 64-bit data types long and double, the model defines a loose rule that allows the virtual machine to divide reads and writes to 64-bit data that is not volatile into two 32-bit operations. That is, a read from a thread that is not volatile is not an atomic operation and may read only “half a variable”. However, commercial virtual machines almost always implement reading and writing to 64-bit data as atomic operations, so we can ignore this problem.

Principle of antecedent

The Java memory model has some innate “orderliness,” that is, orderliness that can be guaranteed without any means of synchronization (volatile, synchronized, etc.), which is often referred to as the happens-before principle.

If two operations are not executed in order and cannot be deduced from the happens-before principle, they are not guaranteed to be ordered, and the virtual machine can reorder them at will.

  1. Program Order Rule: In a thread, logically written operations occur first before those written later.
  2. Monitor Lock Rule: An unLock operation occurs first when a subsequent Lock operation is performed on the same Lock. “Behind” refers to the chronological order.
  3. Volatile Variable Rule: Writes to a volatile Variable occur first after reads to that Variable. “Behind” refers to the chronological order.
  4. Transitivity rule: if operation A precedes operation B and operation B precedes operation C, operation A precedes operation C.
  5. Thread Start Rule: The Start () method of a Thread object precedes each action of the Thread.
  6. Interruption Rule: Calls to the interrupt() method occur first when the interrupted Thread’s code detects that the Interruption has occurred (as detected by Thread.interrupted()).
  7. Thread Termination Rule: All operations ina Thread occur first in the Termination detection of the Thread. We can detect that the Thread has terminated by ending the thread.join () method and returning the value of thread.isalive ().
  8. Finaizer Rule: The finalization of an object (completion of constructor execution) occurs first at the beginning of its Finalize () method.

Problem solving

atomic

  • Atomic variable operations directly guaranteed by the JMM include read, load, use, assign, Store, and write.
  • Reads and writes (working memory) for basic data types are atomic

JMM locks and unlock provide a wider range of atomicity guarantees, but this is something that the JVM needs to support, and for developers it is the synchronized keyword or lock that reads and writes locks to ensure atomicity.

visibility

Volatile variable values that are changed by one thread are immediately synchronized back to main memory, and values are flushed from main memory to working memory before being read by another thread. That is, read, load, and use are executed in sequential order, and assign, Store, and write are executed in sequential order.

Synchronized /Lock Is guaranteed by the rules for using Lock and unlock

  • “Before you can unlock a variable, you must synchronize it to main memory (store and write).”
  • “If a lock operation is performed on a variable, the value of the variable will be emptied from working memory, and the load or assign operation will be re-executed to initialize the variable before the execution engine can use it.”

As soon as a final modified field is initialized in the constructor, and the constructor does not pass a reference to “this”, the value of the final field is immediately visible to other threads.

sequential

Volatile disallows instruction reordering

Synchronized /Lock “A variable can only be locked by one thread at a time”

Development of article

volatile

Variables that are volatile maintain sequentiality and visibility

sequential

  • A write to a volatile variable occurs first after a read to that variable. “Behind” refers to the chronological order

visibility

  • When a volatile variable is written, the JMM flushers the shared variable from the thread’s working memory to main memory.
  • When a volatile variable is read, the JMM invalidates the thread’s working memory, and the thread then reads the shared variable from main memory.

Volatile is very lightweight compared to synchronized/Lock, but the scenarios are limited:

  • Writes to variables do not depend on their current value, that is, just reads and writes, such as signs of completion, interruption, or status
  • Disallow reordering of volatile variable operation instructions

Realize the principle of

The underlying implementation of volatile is through cpu-provided memory barrier instructions. Hardware memory barriers are classified into Load barriers and Store barriers, namely read barriers and write barriers.

Memory barriers serve two purposes:

  • Prevents reordering of instructions on both sides of the barrier
  • Force write buffer/cache dirty data etc. to write back to main memory, invalidate the corresponding data in the cache

final

For the memory semantics of final fields, the compiler and processor adhere to two reordering rules (internal implementations also use memory barriers) :

  • Reordering rules for writing final fields: There is no reordering between writing a final field ina constructor and then assigning a reference to the constructed object to a reference variable.
  • Reordering rules for reading final fields: There is no reordering between the first reading of a reference to an object containing a final field and the subsequent first reading of the final field.
public class FinalExample {
       int i;/ / common domain
       final int j;/ / final domain
       static FinalExample obj;
       
       public FinalExample (a) {
              i = 1;// Write the normal field. Writes to normal fields may be reordered outside the constructor
              j = 2;// Write final fields. Writes to final fields are not reordered outside the constructor
       }
       
       // Write thread A executes
       public static void writer (a) {    
              obj = new FinalExample ();
       }
       
       // Read-thread B executes
       public static void reader (a) {    
              FinalExample object = obj;// Read the object reference
              int a = object.i;// Read the normal field. You might see a result of 0(since I =1 might be reordered out of the constructor, y has not yet been initialized)
              int b = object.j;// Read final fields. Make sure you see that the result is 2}}Copy the code

There is an indirect dependency between the first read object reference and the first read of the final field that the object contains. Because the compiler obeys indirect dependencies, the compiler does not reorder these two operations. Most processors also abide by indirect dependencies and do not reorder these two operations. This rule is specifically for a few processors that allow reordering of operations that have indirect dependencies (such as alpha processors).

For final fields that are reference types, reordering rules for writing final fields impose the following constraints on the compiler and processor:

  • There is no reordering between writing to the member field of a final referenced object inside the constructor and then assigning a reference to the constructed object outside the constructor to a reference variable.

synchronized

Synchronized is used to modify normal methods, static methods, and code blocks

  • Ensure synchronous code execution (i.e., mutual exclusion between threads) (atomicity)
  • Ensure that changes to shared variables are visible in time (visibility)
  • Effective instruction reordering solution (sequential)

Realize the principle of

Control is done using the object’s Monitor

  • The bytecode instruction MonitorEnter is executed on entry/lock
  • The bytecode directive MonitorExit is executed on exit/unlock
    • If the executing code has an abnormal exit method/code segment, it will be unlocked automatically

Which object’s monitor to use:

  • When modifying an object method, use the monitor for the current object
  • Use a Class type (object of Class) monitor when decorating static methods
  • When modifying a code block, use the monitor of the object in parentheses
    • An Object that must be the Object class or a subclass of it

MonitorEnter (lock) :

  • Each object has an associated monitor.
  • The monitor is locked if and only if it has an Owner.
  • The thread executes MonitorEnter in order to become the owner of Monitor.
  • If the Entry Count of a Monitor object (the number of reentries by the thread that owns it) is 0, set it to 1, and the thread places itself as the owner of the Monitor object.
  • If the owner of Monitor is the current thread, the Monitor is re-entrant, increasing its count of records by one.
  • If the Monitor is owned by another thread, the current thread blocks until the record count reaches zero before competing for ownership.

MonitorExit (unlocked) :

  • The thread executing MonitorExit must be the owner of the monitor to which this object is associated.
  • The thread decreases the number of records in the Monitor object by one.
  • If the number of records in the Monitor object is 0, the thread exits and is no longer the owner.
    • Other blocked threads are then allowed to compete for the owner.

For MonitorEnter, MonitorExit, there are two basic parameters:

  • thread
  • Objects associated with the monitor

The key structure

In the JVM, the layout of objects in memory is divided into three areas: object headers, instance data, and aligned padding. As follows:

The instance variables

  • Store the attribute data information of the class, including the attribute information of the parent class
  • If it is an instance variable of an array, it also includes the length of the array
  • This portion of memory is aligned with 4 bytes

Fill in the data

  • The vm requires that the start address of the object be a multiple of 8 bytes
  • The data is populated only for byte alignment
    • Ensure that the start address of the next object is an integer multiple of 8
  • It might have a length of 0

Object Header

  • The object header consists of a Mark Word, a Class Metadata Address, and an array length if the object is an array
  • In 32-bit and 64-bit virtual machines, Mark Word takes up 32 and 64 bytes, respectively, so it is called Word

Mark Word stores not the actual business data of the object (such as the field values of the object), which is an additional storage cost. In order to save storage space, Mark Word is designed as an unfixed data structure to store as much data as possible in as small a space as possible. It will transform its data structure according to the state of objects, so as to reuse its own storage space.

There are four lock states: no lock, biased lock, lightweight lock, and heavyweight lock. As competition increases, lock usage is as follows:

No lock -> Bias lock -> Lightweight lock -> Heavyweight lock

Biased locking and lightweight locking were introduced in JDK 6 and enabled by default. The escalation of the lock (lock inflation, inflate) is one-way and can only go from low to high (left to right). There will be no lock degradation.

Biased locking

When the lock object is first acquired by the thread, the virtual machine sets the flag bit in the object header to 01 (biased), which is biased mode. At the same time, the CAS operation is used to record the ID of the thread that obtained the lock in the Mark Word of the object. If the CAS operation is successful, the VM can no longer perform any synchronization operation every time the thread that holds the biased lock enters the lock related synchronization block.

The bias mode ends when another thread attempts to acquire the lock. Revoke Bias, depending on whether the lock object is currently in the locked state, and then revert to the unlocked (flag bit “01”, no Bias) or lightweight locked (flag bit “00”) state. Subsequent synchronization operations enter the lightweight lock process.

Lightweight lock

Entering a lightweight lock means that more than one thread is trying to acquire the lock, which is acquired through adaptive spin CAS. If the acquisition fails, the lock expands and the heavyweight lock process is entered, with the thread blocked.

Heavyweight lock

Heavyweight locks are implemented through the system’s thread mutexes and are the most expensive

ContentionList, CXQ, stores the threads that recently contended for locks

  • LIFO, one-way linked list
  • Many threads can queue the thread requesting the lock
  • But only one thread can dequeue a thread

EntryLis: indicates the winner group

  • Two-way linked list
  • Only threads with locks can access or change EntryLis
  • All ContentionList threads can be dequeued and added to EntryList only if EntryList is empty and ContentionList is not empty when the thread owning the lock releases the lock

WaitSet, which holds the threads in the wait state

  • Place the thread that makes wait() into WaitSet
  • When the notify() and notifyAll() calls are made, threads are placed in the ContentionList or EntryList queues

Note:

  • A thread can be in at most one of three sets at any one time
  • Threads in all three collections are BLOCKED, using mutex to block underneath

When the owner field of the object monitor changes from NULL to non-null when a thread successfully obtains the lock, the thread pointing to it must dequeue itself from the ContentionList or EntryList

In a competitive lock transfer mechanism, when a thread releases a lock, it does not guarantee that the subsequent thread can acquire the lock, but that the subsequent thread compets for the lock

OnDeck, which stands for ready thread, ensures that only one thread is competing directly for the lock at any one time

  • When the lock is acquired, if there is a contention, the spin lock is used to contention, if the spin still can not get, and then put in the above queue.
  • Spin reduces dequeued operations on ContentionList and EntryList, thus reducing contention on internally maintained locks.

OS mutex

Heavyweight locks are implemented through the operating system’s thread mutex_lock. In Linux, the lock technique is pthead_mutex_lock/pthead_mutex_UNLOCK, which is a mutex_lock between threads.

Thread Mutex is implemented based on fuTEX (Fast Userspace Mutex) mechanism. Normal operating system synchronization mechanisms (such as IPC, etc.) are called to execute into the kernel, even if there is no contention to perform a trap operation (int 0x80, trap). Futex, on the other hand, is a mixture of kernel-mode and user-mode. In the absence of contention, neither lock acquisition nor lock release is required to be trapped in the kernel.

The initial distribution

The futex shared variable is allocated in memory first. For threads, memory is shared and allocated (malloc) directly, which is an integer with an initial value of 1.

Acquiring a lock

Subtract 1 from futex variable by CAS and observe the result:

  • If the value changes from 1 to 0, there is no contest
  • If it is less than 0, it indicates that there is a race. Call futex(… , FUTEX_WAIT, …) Sleep the current thread

Release the lock

Add 1 to the futex variable using CAS

  • If the futex variable changes from 0 to 1, there is no contest and continue
  • If the futex variable is negative before the change, it indicates that there is a race. Call futex(… , FUTEX_WAKE, …) Wakes up one or more waiting threads

Lock

Write another article specifically to explain the lock mechanism of AQS, looking forward to