Writing in the front

Recently, some friends have been asking me to sort out the knowledge about JVM. After ten days of collecting and sorting out, the first version is sorted out. Hope to help you.

What is the JDK?

The JDK is the smallest environment for supporting Java program development.

  1. Java programming language
  2. The Java virtual machine
  3. Java API libraries

What is JRE?

JRE is a standard environment that supports Java programs.

  1. Java SE API subset
  2. The Java virtual machine

What are the features of Java historical versions?

Java Version 5.0 SE

  • Introduction of generics;
  • Enhance loops, which can be used iteratively;
  • Automatic packing and automatic unpacking;
  • Type-safe enumeration;
  • Variable parameter;
  • Static introduction;
  • Metadata (note);
  • The introduction of Instrumentation.

Java Version SE 6

  • Support for scripting languages;
  • Introducing the JDBC 4.0 API;
  • Introducing the Java Compiler API;
  • Pluggable notes;
  • Added support for Native Public Key Infrastructure (PKI), Generic Security Service (Java GSS), Kerberos, and Lightweight Directory Access (LDAP) Protocol);
  • Inheriting Web Services;
  • Lots of optimizations.

Java Version SE 7

  • The switch block allows strings as branching conditions;
  • Apply type inference when creating generic objects;
  • Catch multiple exceptions in a block of statements;
  • Support for dynamic languages;
  • Support the try – with – resources;
  • Introduction of Java NIO.2 development package;
  • Numeric types can be represented as binary strings, and underscores can be added to the string representation;
  • Diamond syntax;
  • Automatic handling of NULL values.

Java 8

  • Functional interface
  • Lambda expressions
  • Stream API
  • Interface enhancement
  • Time and date enhancement API
  • Repeat annotations and type annotations
  • Default methods versus static methods
  • Optional container class

What does the runtime data area include?

  1. Program counter
  2. Java virtual machine stack
  3. Local method stack
  4. The Java heap
  5. Methods area
  6. Run-time constant pool
  7. Direct memory

Program counter (thread private)

The Program Counter Register is a small memory space that can be thought of as a line number indicator of the bytecode executed by the current thread. Basic functions such as branching, looping, jumping, exception handling, thread recovery, and so on rely on this counter.

Multithreading in the Java virtual machine is implemented by switching threads in turn and allocating processor execution time. In order to restore the correct execution position after the thread switch, each thread needs an independent program counter. The counters between the threads do not affect each other and are stored independently.

  1. If the thread is executing a Java method, the counter records the address of the virtual machine bytecode instruction being executed.
  2. If the Native method is being executed, the value of this counter is null.

The program counter is the only area that does not specify any OutOfMemoryError.

Java Virtual Machine Stack (thread private)

Java Virtual Machine Stacks are thread-private and have the same lifetime as a thread. The virtual machine Stack describes the memory model of Java method execution: each method execution creates a Stack Frame, which is stored

  1. Local variable scale
  2. Stack operation
  3. Dynamic link
  4. Methods the export

Each method called to the completion of the execution process, corresponding to a stack frame in the virtual machine stack from the process.

There are two anomalies in this region:

  1. StackOverflowError: Stack depth of thread request is greater than the depth allowed by the virtual machine
  2. OutOfMemoryError: When the virtual machine stack is too large to allocate enough memory

Local method stack (thread private)

The virtual machine stack services the execution of Java methods (bytecodes) for virtual machines.

Native Method Stacks provide services for Native methods used by virtual machines.

Java heap (Thread sharing)

The Java Heap is the largest chunk of memory in the Java virtual machine. The Java heap is created at virtual machine startup and is shared by all threads.

Function: Stores object instances. The garbage collector mainly manages the Java heap. The Java heap can be physically discontinuous, as long as it is logically continuous.

Method area (thread sharing)

The Method Area is shared by all threads and is used to store information about classes that have been loaded by the virtual machine, constants, static variables, code compiled by the just-in-time compiler, and so on.

As with the Java heap, continuous memory is not required, and you can choose to have a fixed size, or you can choose not to implement garbage collection.

Run-time constant pool

The Runtime Constant Pool is part of the method area. Saves symbolic references, translated direct references, in Class files. Run-time constant pools allow new constants to be put into the pool at run time.

How does object access work in Java?

Object obj =  new  Object();
Copy the code

For even the simplest of these accesses, the Java stack, the Java heap, and the method area are the three most important memory areas.

Object obj
Copy the code

If present in the method body, the above code is reflected in the Java stack’s local variation table as reference type data.

new  Object()
Copy the code

Reflected in the Java heap, forming a block of memory that stores the data values of all Object instances of type Object. The Java heap also contains address information for object type data, which is stored in the method area.

How do I determine if an object is “dead”?

  1. Reference counting method
  2. Root search algorithm

What is reference counting?

Add a reference counter to the object, which increases by 1 every time a reference is made to it; When a reference is invalid, the counter is -1; An object whose counter is 0 at all times is no longer usable.

Disadvantages of reference counting?

It is difficult to solve the problem of circular references between objects.

What is a root search algorithm?

Through a series of objects named “GC Roots” as the starting point, the search starts from these nodes and goes down the path called the Reference Chain. When an object is not connected to GC Roots by any reference chain (in graph theory terms, it is unreachable from GC Roots to the object), then the object is unusable.

4 Ways to reference Java?

After JDK 1.2, Java expanded the concept of references to classify them as

  1. Strong Reference
  2. Soft Reference Soft Reference
  3. Weak Reference
  4. Phantom Reference

Strong reference

Object obj =  new  Object();
Copy the code

Ubiquitous in code, such as the above reference. As long as strong references exist, the garbage collector will never reclaim the referenced object.

Soft references

A term used to describe objects that are useful but not necessary. Objects associated with soft references are listed in the recycle scope and recycled a second time before the system is about to run out of memory. If there is not enough memory for this collection, a memory exception will be thrown. SoftReference class is provided to implement SoftReference.

A weak reference

Describe non-essential objects, which are weaker than soft references. Objects associated with weak references only survive until the next garbage collection occurs. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether there is currently enough memory. WeakReference class is provided to implement WeakReference.

Phantom reference

Whether an object has a virtual reference or not has absolutely no effect on its lifetime, nor can an object instance be obtained through a virtual reference. The sole purpose of associating a virtual reference to an object is to receive a system notification when the object is reclaimed by the collector. PhantomReference classes are provided to implement virtual references.

What garbage collection algorithms are there?

  1. Mark-clear algorithm
  2. Replication algorithm
  3. Mark-collation algorithm
  4. Generational collection algorithm

Mark-sweep algorithm

What is a mark-erase algorithm?

There are two stages, marking and clearing. First, mark all the objects that need to be recycled. After the mark is completed, all the marked objects are recycled.

What are the disadvantages?

Efficiency issues: Both the tagging and cleaning processes are inefficient.

Space issues: A large number of discrete memory fragments are generated after the tag is cleared. So much space debris may cause the program to allocate large objects without finding enough contiguous memory and have to start another garbage collection operation early.

Copying algorithms – New generation

Divide the available memory into two equally sized pieces by capacity and use only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again.

advantages

The replication algorithm makes it easy to implement and run efficiently by moving the pointer on the top of heap and allocating the memory sequentially without considering the complexity of memory fragmentation.

disadvantages

Reduce the memory size by half. When the object survival rate is high, more replication operations need to be performed, resulting in low efficiency.

application

Commercial virtual machines use replication algorithms to recycle the new generation. Since objects in the new generation tend to die, the memory space is not divided in a 1:1 ratio. Instead, it is divided into a large Eden space and two smaller Survivor Spaces. Use Eden and a chunk of Survivor each time.

When recycling is done, the surviving objects in Eden and Survivor are copied to another Survivor space at a time, and Eden and the Survivor space that was just used are cleaned up. By default, the Hotspot VIRTUAL machine has an 8:1 ratio of Eden to Survivor, which means that each new generation has 90% (80% + 10%) of available memory, and only 10% of memory is “wasted”.

Mark-compact – old age

The marking process is still the same as the mark-clean algorithm, but instead of cleaning up the recyclable objects directly, all surviving objects are moved to one end, and the memory beyond the boundary is cleaned up directly.

Generational collection algorithm

Memory is divided into blocks based on the lifetime of the object. Typically, the Java heap is divided into the new generation and the old generation, so that the most appropriate collection algorithm can be adopted based on the characteristics of each generation.

  • New generation: Every garbage collection, a large number of objects die and only a few survive, so the replication algorithm is selected and only a small number of surviving objects can be collected at the cost of replication.
  • Old age: An object has a high survival rate, no extra space for it to be allocated guaranteed, and must be recycled using a “mark-clean” or “mark-tidy” algorithm.

What is the difference between the Minor AND Full GC?

Minor GC: Refers to garbage collection that occurs in the new generation. Because Java objects die frequently, Minor GC is very frequent and usually collects quickly. Full GC: Old GC, also known as Major GC, is typically 10 times slower than Minor GC.

Java memory

Why partition heap memory?

For a large system, when a large number of objects and method variables are created, that is, there are many objects in the heap memory, it is inefficient to analyze whether objects should be reclaimed one by one. Partitioning is designed for modular management of different objects and variables to improve the performance of the JVM.

What are the blocks of heap memory?

  1. Young Generation Space
  2. Tenure Generation Space (also called old Generation)
  3. Permanent Space Permanent storage area

Generational collection algorithm

What are the principles of memory allocation?

  1. The object is allocated in Eden first
  2. Big object goes straight to the old age
  3. Long-lived objects will enter the old age
  4. Dynamic object age determination
  5. Space allocation guarantee

Young Generation Space (with replication algorithm)

It is mainly used to store newly created objects with small memory and frequent garbage collection. This zone is divided into three zones: one Eden Space and two Survivor Spaces.

  • When an object is created in the heap, it enters the Eden Space of the young generation.
  • When garbage collector collects garbage, it scans Eden Space and A Suvivor Space. If the object is still alive, it copies it to B Suvivor Space. If B Suvivor Space is full, it copies Old Gen
  • When scanning A Suvivor Space, if the object is still alive after several scans, the JVM considers it to be an Old object and moves it to Old Gen.
  • After the scan is complete, the JVM empties Eden Space and A Suvivor Space, and then switches roles of A and B (i.e., Eden Space and B Suvivor Space will be scanned for next garbage collection).

Tenure Generation Space (using mark-tidy algorithm)

It is used to store objects that have been referenced for a long time. It stores objects that are still alive after several times of scanning in Young Generation Space, with large memory and low garbage collection frequency.

Permanent Space

Store immutable class definitions, bytecodes, constants, and so on.

The Class file

Platform independence of the Java Virtual machine

What makes a Class file?

A Class file is a set of binary streams based on 8-bit bytes, without any separators between data items. When encountering data items with more than 8-bit space, they are separated into several 8-bit bytes for storage according to the way with the highest order.

Version of magic number and Class file

The first four bytes of each Class file are called the Magic Number, which is used only to determine whether the file is an acceptable Class file for the virtual machine. OxCAFEBABE.

Next comes the Version number of the Class file: bytes 5 and 6 are Minor versions, and bytes 7 and 8 are Major versions.

Compile the output Class file using JDK 1.7 in the following format:

The first four bytes are magic numbers, the minor version number is 0x0000, and the major version number is 0x0033, indicating that this file can be executed by VMS of version 1.7 or later.

  • 33: JDK1.7
  • 32: JDK1.6
  • 31: JDK1.5
  • 30: JDK1.4
  • 2 f: JDK1.3

Class loader

What does a class loader do?

The class loader implements the loading action of a class and is used to identify a class. For any class, its uniqueness within the Java virtual machine needs to be established both by the classloader that loads it and by the class itself. Even if two classes come from the same Class file, they are not equal as long as they are loaded by different classloaders.

What are class loaders?

  1. Bootstrap ClassLoader: implemented using C++ (HotSpot only) as part of the virtual machine itself. Responsible for loading the class libraries in the \lib directory into the virtual machine. It cannot be referenced directly by Java programs.
  2. Extention ClassLoader (Extention ClassLoader) is implemented by ExtClassLoader. It is responsible for loading all libraries in \lib\ext directory. Developers can use it directly.
  3. Application ClassLoader: Implemented by APPClassLoader. Is responsible for loading the libraries specified on the user’s ClassPath.

Class loading mechanism

What is the parental delegation model?

The Parents Delegation Model requires that all loaders have their own parent class loaders except for the top level launcher class loaders. Parent-child relationships between class loaders, reused through combinatorial relationships. How it works: If a classloader receives a classload request, it does not try to load the class itself, but delegates the request to the parent classloader. This is true at each level of class loaders, so all load requests should eventually be passed to the top level of the boot class loader, and the child loader will only attempt to load itself if the parent loader reports that it cannot complete the load request (its search scope did not find the desired class).

Why use the parent delegate model to organize relationships between class loaders?

Java classes have a hierarchical relationship with priority along with their classloaders. For example, java.lang.Object, which is stored in rt.jar, is ultimately delegated to the bootloader by whichever class loader loads the class, so Object is the same class in each class loader environment of the program.

Without the parental delegation model, which allows each class loader to load it, the most basic behavior of the Java type system is not guaranteed, and the application becomes a mess.

What is the class loading mechanism?

All information described in the Class file must be loaded to the VM before it can run. The virtual machine loads the data describing the Class from the Class file to the memory, verifies, transforms, and initializes the data, and finally forms Java types that can be directly used by virtual machines. This is the Class loading mechanism of virtual machines.

What is the difference between a virtual machine and a physical machine?

Both machines are capable of code execution, but:

  • The execution engine of a physical machine is built directly at the processor, hardware, instruction set, and operating system levels.
  • The execution engine of the virtual machine is self-implemented, so it can make its own instruction set and execution engine architecture, and can execute those instruction set formats that are not directly supported by the hardware.

Run time stack frame structure

Stack frames are data structures that support method invocation and method execution by virtual machines and store methods

  • Local variable scale
  • The operand stack
  • Dynamic connection
  • Method return address

Each method from the call to the completion of the process, corresponding to a stack frame in the virtual machine stack from the stack to the stack process.

Java method call

What is a method call?

The only task of a method call is to determine the version of the method being called (which method to call), not the actual running process inside the method.

What’s so special about Java method calls?

The process of compiling a Class file does not include the concatenation steps of a traditional compilation, and all method calls are stored in the Class file as symbolic references rather than the entry address of the method in the actual runtime memory layout. This makes Java powerful for dynamic extension, but it makes calling Java methods relatively complex, requiring direct references to target methods to be determined during class loading or even at runtime.

What are the bytecode instructions that Java virtual machines invoke?

  • Invokestatic: Invokes static methods
  • Invokespecial: Invoke instance constructor methods, private methods, and superclass methods
  • Invokevirtual: Calls all virtual methods
  • Invokeinterface: Invokes interface methods

How does the virtual machine execute the bytecode instructions in the method?

Interpreted execution (executed by interpreter) Compiled execution (native code produced by just-in-time compiler)

Explain to perform

When mainstream virtual machines include just-in-time compilers, only the virtual machine can determine whether the code in the Class file will be interpreted or compiled.

The Javac compiler completes the process of program code through lexical analysis, syntax analysis to abstract syntax tree, and then traverses the syntax tree to generate linear bytecode instruction stream. Because this action takes place outside the Java virtual machine and the interpreter is inside the virtual machine, compilation of Java programs is a semi-independent implementation.

Stack – based instruction set and register – based instruction set

What is a stack-based instruction set?

A stream of instructions output by the Java compiler, most of which are zero-address instructions that rely on the operand stack.

To calculate “1+1=2”, the stack-based instruction set looks like this:

iconst_1
iconst_1
iadd
istore_0
Copy the code

The two iconst_1 instructions continuously push the two constants 1 onto the stack, the iadd instruction adds the top two values off the stack and puts the result back on the top of the stack, and finally istore_0 puts the top value into Slot 0 of the local variable table.

What is register-based instruction set?

The most typical is the x86 address instruction set, which relies on registers to work. To calculate “1+1=2”, the register-based instruction set looks like this:

mov eax,  1
add eax,  1
Copy the code

The MOV instruction sets the value of the EAX register to 1, and the add instruction increases the value by 1. The result is stored in the EAX register.

What are the advantages and disadvantages of stack-based instruction sets?

Advantages:

  • Portability: User programs do not use these registers directly, and the virtual machine decides to put some of the most frequently accessed data (program counters, top of stack cache) into registers for better performance.
  • The code is relatively compact: there is one instruction for each byte in the bytecode
  • The compiler is simple to implement: there is no need to worry about space allocation and all the required space operates on the stack

Disadvantages:

  • Execution speed is slightly slow
  • The instructions required to perform the same function are much more proficient

Frequent access to the stack means frequent access to memory, which is the bottleneck for execution speed relative to the processor.

What are the steps in the Javac compilation process?

  1. Parse and populate symbol tables
  2. Annotation processing by plug-in annotation processor
  3. Analysis and bytecode generation

What is a just-in-time compiler?

Java programs are originally interpreted through an interpreter, and when the virtual machine detects that a method or block of Code is being run particularly frequently, it identifies that Code as “Hot Spot Code.”

To improve the efficiency of hot code execution, the virtual machine compiles the code to local platform-specific machine code and performs various levels of optimization at run Time by a Compiler known as a Just-in-time Compiler (JIT).

Interpreters and compilers

Many mainstream commercial virtual machines contain both an interpreter and a compiler.

  • When a program needs to be started and executed quickly, the interpreter comes into play first, saving compilation time and executing immediately.
  • As the program runs, the compiler comes into play over time, compiling more and more code into native code, which improves execution efficiency.

If memory resources are limited (some embedded systems), interpreted execution can be used to save memory, whereas compiled execution can be used to improve efficiency. At the same time, the compiler’s code can be returned to the interpreter’s code.

Why hierarchical compilation?

Because the just-in-time compiler takes up runtime to compile native code, it takes longer to compile more optimized code.

What are the layers of a hierarchical compiler?

Hierarchical compilation Is divided into different compilation levels based on the scale and time required for compiler compilation and optimization, including:

  • Layer 0: program interpretation execution, the interpreter does not enable performance monitoring function, can start layer 1 compilation.
  • Level 1: Also called C1 compilation, bytecode is compiled into native code for simple and reliable optimizations, including performance monitoring logic if necessary.
  • Layer 2: Also called C2 compilation, it also compiles bytecode to native code, but with some optimizations that take longer to compile and even make unreliable aggressive optimizations based on performance monitoring information.

Using Client Compiler and Server Compiler will work together. The Client Compiler is used to obtain higher compilation speed, and the Server Compiler is used to obtain better compilation quality.

Compile objects and trigger conditions

What is the hot code?

  • A method that is called multiple times
  • The body of a loop that is executed multiple times

How can I tell if a piece of code is hot code?

To know if a piece of code is hot code and needs to trigger just-in-time compilation, this behavior is called hotspot detection. There are two main methods:

  • With sample-based hotspot detection, the virtual machine periodically checks the top of the stack for each thread. If a method appears frequently at the top of the stack, it is a “hotspot method.” Implementation is simple and efficient, but it is difficult to accurately determine the popularity of a method.
  • Based on counter hot spot detection, the virtual machine sets up counters for each method and counts the number of times the method is executed. If the number of times the method is executed exceeds a certain threshold, it is considered as a hot method.

The HotSpot VIRTUAL machine uses the second type and has two counters:

  • Method call counter
  • Backside counter (judge loop code)

Method calls the counter statistics method

It counts a relative frequency of execution, the number of times a method is called over a period of time. Beyond a certain time limit, if a method has not been called enough times to submit to the just-in-time compiler, the method’s call counter is reduced by half. This process is called the heat decay of the method call counter, and this time is called the half-decay cycle.

What are the classic optimization techniques (just-in-time compilers)?

  • One of the classic optimization techniques for language independence is common subexpression elimination
  • One of the classic language-related optimization techniques: array range checking elimination
  • One of the most important optimization techniques: method inlining
  • One of the most cutting-edge optimization techniques: escape analysis

Common subexpression elimination

A classic optimization technique commonly used in various compilers. Its meaning is:

If an expression E has been evaluated and the values of all variables in E have not changed since the previous evaluation, the occurrence of E becomes a common subexpression. We don't have to recalculate, we can just substitute E.Copy the code

Array boundary checking eliminated

Because Java automatically checks for array transgressions, each read or write to an array element carries an implicit conditional operation, which can be a performance burden for code with a large number of array accesses.

If occurred in the loop in array access and use loop variables for array access, if the compiler as long as through the data flow analysis can determine the scope of loop variable forever in array interval, then the entire loop can eliminate the upper and lower bounds of the array check, condition judgment operation can save many times.

Methods the inline

Inlining eliminates the cost of method calls and provides a good foundation for other optimization tools.

The compiler inlines non-virtual methods directly. If the virtual method is encountered, it will query whether there are multiple target versions to choose from under the current program. If the query result only has one version, it can also be inlined. However, such inlining is radical optimization, and an escape door (Slow Path when Guard conditions are not standing) needs to be reserved, which is called guardian inlining.

If the virtual machine never loads a class that would cause the method recipient’s inheritance to notice the change during subsequent execution of the program, the inline optimized code can be used forever. Otherwise, you need to discard the compiled code, fall back to interpreted execution, or recompile.

Escape analysis

The basic behavior of escape analysis is analyzing object dynamic scope: when an object is defined within a method, it may be referenced by external methods, which is called method escape. Access by an external thread is called thread escape.

What optimizations can be made if the object does not escape out of a method or thread?

  • Stack allocation: Objects are generally allocated in the Java heap and are shared and visible to each thread. As long as you hold a reference to this object, you can access the object data stored in the heap. However, garbage collection and collation can be time-consuming. If an object does not escape the method, it can be allocated memory on the stack, and the memory occupied by the object can be destroyed as the stack frame goes off the stack. If stack allocation is used, a large number of objects are automatically destroyed at the end of the method, making garbage collection much less stressful.
  • Synchronization elimination: Thread synchronization is itself a time-consuming process. If escape analysis can determine that a variable cannot escape from the thread, there must be no contention between reads and writes of that variable, and synchronization measures can be eliminated.
  • Scalar substitution: Instead of creating this object, create its member variables that are used by this method.

Java versus C/C++ compilers

  1. Just-in-time compiler runs take up the running time of the user program and are under a lot of time pressure.
  2. Although the Java language does not have the virtual keyword, it uses virtual methods much more frequently than C++, so the just-in-time compiler is much more difficult to optimize than the C++ statically optimized compiler.
  3. Java is a dynamically extensible language. Loading new classes at run time may change the inheritance relationship of program types, making global optimization difficult to carry out, because the compiler cannot see the whole picture of the program, and the compiler has to pay attention to and change with the type, and undo or re-optimize some at run time.
  4. Memory allocation for Java language objects is on the heap, and only local variables of methods can be allocated on the stack. C++ objects have multiple memory allocations.

How does a physical machine handle concurrency?

Computing tasks, in addition to processor computation, also need to interact with memory, such as reading operational data, storage of operational results, etc. (can not only rely on registers to solve). The computing speed of a computer’s storage device differs by orders of magnitude from that of the processor, so a layer of Cache, which reads and writes as fast as possible, serves as a buffer between the memory and the processor: the data needed for an operation is copied to the Cache to make the operation run quickly. When the operation completes, it synchronizes from the cache back to memory so that the processor doesn’t have to wait for slow memory reads and writes. Cache-based storage interaction resolves the processor-memory speed contradiction nicely, but introduces a new problem: cache consistency. In a multiprocessor system, each processor has its own cache and they share the same main memory. When the computing tasks of multiple processors all involve the same main memory, the cache data of each processor may be inconsistent. In order to solve the problem of consistency, each processor needs to follow the cache consistency protocol when accessing the cache. At the same time, to make the processor fully utilized, the processor may optimize the output code for out-of-order execution. The Just-in-time compiler for the Java virtual machine has a similar optimization for instruction reordering.

Java memory model

What is the Java Memory model?

The Java virtual machine specification is used to mask differences in memory access between hardware and operating systems so that Java programs can achieve consistent concurrency across platforms.

What is the goal of the Java memory model?

Define the access rules for variables in the program, i.e. the low-level details of storing variables in and out of memory in the virtual machine. Variables here include instance fields, static fields, and elements that make up array objects, but not local variables and method parameters, because these are thread private and not shared, so there are no race issues.

Main memory vs. working memory

All variables are stored in main memory, and each thread also has its own working memory, which holds main memory copies of variables used by the thread. All operations (reads, assignments) by a thread to a variable must be performed in working memory. It cannot read or write variables directly from main memory. Different threads cannot directly access the variables of each other’s working memory. The values of variables between threads need to be passed through the main memory.

Interoperation between memory

The Java memory model defines eight operations for how a variable is copied from main memory to working memory and synchronized from working memory back to main memory:

Atomicity, visibility, order

  • Atomicity: Access and read/write to basic data types are atomic. For broader atomicity guarantees, you can implicitly use lock and unlock operations using the bytecode instructions Monitorenter and Monitorexit. These two bytecode instructions are reflected in Java code as synchronized blocks — the synchronized keyword. Thus, operations between synchronized blocks are also atomic.
  • Visibility: When one thread changes the value of a shared variable, other threads are immediately aware of the change. The Java memory model implements visibility by synchronizing the new value back to main memory after a variable is modified, and flushing the value from main memory before the variable is read. The special rules of volatile ensure that new values are immediately synchronized to main memory and flushed from main memory immediately before each use. Synchronized and final can also achieve visibility. Once a final modified field is initialized in the constructor, and the constructor does not pass a reference to this, the value of the final field is visible to other threads.
  • Orderliness: The orderliness of Java programs can be summed up in a sentence, if observed in the thread, all operations are ordered (the semantics of serial within the thread); If you observe another thread in one thread, all operations are unordered (instruction reordering and working memory synchronization delay linear with main memory).

volatile

What is volatile?

The keyword volatile is the lightest synchronization mechanism provided by the Java Virtual machine. When a variable is defined as volatile, it has two properties:

  1. Ensure that this variable is visible to all threads. When one thread changes the value of this variable, the new value is immediately known to other threads. Ordinary variables can’t do that.
  2. Disallow instruction reordering optimization. Ordinary variables only guarantee the correct result during the execution of the method, but do not guarantee the order in which the program code is executed.

Why are operations based on volatile variables not necessarily safe in concurrency?

There is no consistency problem with volatile variables in the working memory of each thread. (Volatile variables in the working memory of each thread are flushed to main memory before they are used.) But operations in Java are not atomic operations, making operations on volatile variables unsafe in parallel.

Why volatile?

In some cases, volatile synchronization performs better than the synchronized keyword, but not as quickly due to the many eliminations and optimizations that the virtual machine implements for locks.

Volatile reads have almost the same performance cost as normal variables, but writes can be slower because they require many memory-barrier instructions to be inserted into the native code to keep the processor from executing out of order.

Concurrency and threading

How does concurrency relate to threads?

Concurrency doesn’t have to depend on multithreading; PHP has multiprocess concurrency. But concurrency in Java is multithreaded.

What is a thread?

Threads are a lighter unit of scheduling execution than processes. Threads can separate resource allocation and execution scheduling for a process. Each thread can share process resources (memory address, file I/O) and schedule independently (thread is the basic unit of CPU scheduling).

What are the ways to implement threads?

  • Implemented using kernel threads
  • Implemented using user threads
  • Use a mix of user threads and lightweight processes

Java thread implementation

The threading model supported by the operating system largely determines how Java virtual machine threads are mapped.

Java thread scheduling

What is thread scheduling?

Thread scheduling is the process by which the system assigns processor rights to threads.

What are the methods of thread scheduling?

  • Collaborative thread scheduling: simple implementation, no thread synchronization problems. However, the thread execution time is not controllable, and the system is prone to crash.
  • Preemptive thread scheduling: Each thread is allocated execution time by the system, and there is no thread blocking the whole process.

Although Java thread scheduling is automated, we can suggest that the system allocate more time to certain threads — by setting thread priority. The Java language has 10 levels of thread priority, and the higher the priority, the easier it is for the system to select a thread for execution.

But you can’t rely solely on thread priority. Because Java threads are mapped to the system’s native threads, thread scheduling is ultimately up to the operating system. For example, there are only seven priorities in Windows, so Java has to have several identical priorities. The priority may be changed by the system itself. There is a “priority booster” in Windows, where the system may allocate execution time to a thread that it perceives to be particularly diligent.

Definition of thread safety?

When multiple threads access to an object, if don’t have to consider these threads in the runtime environment of scheduling and execution alternately, also do not need to undertake additional synchronization, or in any other coordination operation method is called, call the object’s behavior can get the right results, that the object is thread-safe.

What is the shared data that the Java language operates on?

  • immutable
  • Absolute thread safety
  • Relative thread safety
  • The thread is compatible with
  • Thread opposite

immutable

In the Java language, immutable objects must be thread-safe. As long as an immutable object is properly constructed, its externally visible state never changes and never remains in an inconsistent state across multiple threads.

How to implement thread safety?

VMS provide synchronization and locking mechanisms.

  • Blocking synchronization (mutex synchronization)
  • Nonblocking synchronization

Blocking synchronization (mutex synchronization)

Mutex is a means to achieve synchronization. Critical region, mutex and semaphore are the main ways to achieve mutex. The most basic synchronization method in Java is the synchronized keyword, which is compiled to form two bytecode instructions, Monitorenter and Monitorexit, respectively, before and after the synchronized block. Both bytecodes require a Reference parameter to specify which object to lock and unlock. Synchronized in a Java program is a Reference object if it explicitly specifies an object parameter; If not explicitly specified, obtain the corresponding object instance or Class object as the lock object, depending on whether synchronized modifies an instance or Class method. When monitorenter executes, it first tries to acquire the lock of the object.

  • If the object is not locked, or the current thread already owns the lock on the object, the lock counter +1; Counter -1 is locked when the monitorexit directive is executed. When the counter is 0, the lock is released.
  • If the object acquisition fails, the current thread blocks and waits until the lock is released by another thread.

In addition to synchronized, you can use a ReentrantLock in the java.util.concurrent package to achieve synchronization. ReentrantLock has advanced features over synchronized: wait can be interrupted, fair lock can be achieved, and lock can be bound to multiple conditions.

Wait interruptible: When the thread holding the lock does not release the lock for a long time, the waiting thread can opt out of the wait, which is useful for processing synchronous blocks with very long execution times.

Fair lock: Multiple threads waiting for the same lock must acquire the lock in the order in which the lock was applied. Synchronized locks are unfair.

Nonblocking synchronization

The biggest problem with mutex synchronization, the performance problems associated with thread blocking and waking up, is a pessimistic concurrency strategy. It’s always assumed that if it doesn’t do the right synchronization (locking), it’s bound to go wrong, locking, user-mode core mindset conversion, maintaining lock counters, and checking to see if there are blocked threads that need to be woken up, regardless of whether the shared data is actually competing.

As the hardware instruction set evolves, we can use an optimistic concurrency strategy based on collision detection. The operation is done first, and if no other threads requisition data, the operation succeeds; If shared data is expropriated and conflicts arise, then other compensation measures are taken. Many implementations of this optimistic concurrency strategy do not require threads to hang, so they are called non-blocking synchronization.

Which version of the JDK is lock optimization in?

An important theme of JDK1.6 is efficient concurrency. The HotSpot VIRTUAL Machine development team has implemented various lock optimizations on this release:

  • Adaptive spin
  • Lock elimination
  • Lock coarsening
  • Lightweight lock
  • Biased locking

Why spin-locks?

The biggest performance impact of mutex synchronization is the implementation of blocking, suspending and resuming thread operations need to be carried out in the kernel state, these operations put a lot of pressure on the system concurrency. At the same time, many applications share data in a locked state that only lasts for a short period of time, which is not worth suspending or resuming threads. Let’s not suspend the thread for a moment.

How does spin lock work?

If the physical machine has more than one processor, two or more threads can execute in parallel at the same time. Let the later thread that requests the lock wait for a while, but do not give up the processor’s execution time, and see if the thread that holds the lock will be released soon. To make the thread wait, we simply make the thread perform a busy loop (spin).

Disadvantages of spin?

Spin wait itself avoids the overhead of thread switching, but it consumes processor time. So if the lock is held for a short time, the spin wait works very well; If it takes too long, the spinning thread will consume the processor’s resources for nothing. So there is a limit to the spin wait time, and if the spin exceeds the limit and the lock is still not acquired, the thread should be suspended the old-fashioned way.

What is adaptive spin?

The spin time is not fixed, but is determined by the previous spin time on the same lock and the state of the lock owner.

  • If a lock object, spin wait has just successfully acquired the lock, and the thread holding the lock is running, then the virtual machine assumes that the spin may still succeed and runs spin wait longer.
  • If the spin is rarely successful for a lock, it is possible to omit the spin process to avoid wasting processor resources when acquiring the lock in the future.

With adaptive spin, as application execution and performance monitoring information continues to improve, the virtual machine will become more accurate in predicting the condition of the application lock, and the virtual machine will become smarter.

Lock elimination

Lock elimination refers to the elimination of locks that require synchronization on some code but are detected as impossible to compete for shared data when the virtual machine just-in-time compiler runs. Mainly based on escape analysis.

How can a programmer use synchronization knowing that there is no data race? Many are not added by the programmers themselves.

Lock coarsening

In principle, the scope of the synchronous block should be as small as possible. However, if a series of continuous operations repeatedly lock and unlock the same object, even within the loop, frequent mutex synchronization can lead to unnecessary performance loss.

Lock coarsening increases the scope of the lock.

Lightweight lock

Reduce the performance cost of traditional heavyweight locks using OS mutex without multithreading competition.

Biased locking

Eliminate the synchronization primitives in the case of uncontested data, and further improve the performance of the program. That is, in the absence of contention, the entire synchronization is eliminated. The lock is biased in favor of the first thread that acquired it, and if the lock is not acquired by another thread during subsequent execution, the thread that holds the biased lock will never need to synchronize.

Reference: Understanding the Java Virtual Machine in Depth: Advanced JVM Features and Best Practices (version 2)

Write in the last

If you think glacier wrote good, please search and pay attention to “glacier Technology” wechat public number, learn with glacier high concurrency, distributed, micro services, big data, Internet and cloud native technology, “glacier technology” wechat public number updated a large number of technical topics, each technical article is full of dry goods! Many readers have read the articles on the wechat public account of “Glacier Technology” and succeeded in job-hopping to big factories. There are also many readers to achieve a technological leap, become the company’s technical backbone! If you also want to like them to improve their ability to achieve a leap in technical ability, into the big factory, promotion and salary, then pay attention to the “Glacier Technology” wechat public account, update the super core technology every day dry goods, so that you no longer confused about how to improve technical ability!