When we start writing Java programs, we’ll probably write something like this:

public class Main {
    public String getMsg(){
        return "Hello world!";
    }

    public static void main(String[] args){
        Main main = new Main();
        System.out.println(main.getMsg());
    }
}
Copy the code

So what does Java actually do during the execution of this program? Let’s explain it in detail. The following figure shows the use of Java. First, we’ll look at how memory regions are divided in Java and the contents of the various parts of memory regions, which are the foundation of Java programs. For the running process of the program, we will first introduce the creation of objects, the use of objects, and finally remove useless objects.

Memory area

As shown in the figure above, Java’s memory area is typically divided into thread-private program counters, Java stacks, local method stacks, and thread-shared heap and method areas, with additional areas like direct memory. Let’s introduce them one by one:

Program counter: The program counter is the line number indicator of the bytecode executed by the current thread. In the CONCEPTUAL model of the JVM, the bytecode interpreter works by parsing and changing the value of this counter to select the next bytecode instruction to execute. The execution positions of each thread are different and complementary, so program counters are stored independently. When the Native method is executed, the program counter value is null. In particular, the program counter is the only region of the JVM that does not have an OutOfMemoryError.
Java stack: When a Java method is executed, it automatically creates a stack frame to store information. During the process of method invocation and execution, it is the process of stack frame loading and unloading in the VIRTUAL machine. The vm stack also has a local variable table, which stores basic data types, object references, and returenAddress type addresses known at compile time. In the JVM, two exceptions are specified: a StackOverflowError is thrown if the stack depth of a thread request is greater than the maximum allowed depth; If the virtual stack can scale dynamically, OutOfMemoryError will be thrown when sufficient memory cannot be allocated during the scaling.
Local method stack: The local method stack is used when executing Native methods and throws StackOverflowError and OutOfMemoryError exceptions.
Heap: The heap is the largest chunk of memory, shared by all threads, whose sole purpose is to hold object instances. The Java heap is the main area managed by the garbage collector, and is typically subdivided into new generation and old generation based on the lifetime of objects. More detailed divisions are Eden space, From Survivor space, and To Survivor space. From the perspective of memory Allocation, it is also possible to allocate Thread Local Allocation Buffers (TLabs) that are private to threads in the heap. OutOfMemoryError is thrown if there is no memory in the heap to complete the instance allocation and the heap cannot be extended.
Method area: The method area, like the Java heap, is an area of memory shared by individual threads that stores data such as class information that has been loaded by the virtual machine, constants, static variables, code compiled by the just-in-time compiler, and so on. The runtime constant pool is part of the method area. In addition to the description of the Class’s version, fields, methods, interfaces, and so on, the Class file contains the constant pool, which holds the various literal and symbolic references generated by the compiler. This content is stored in the runtime constant pool once the class is loaded. This area may also throw an OutOfMemoryError.
Direct memory: Direct memory is not part of the virtual machine’s run-time data region, nor is it a memory region defined by the JVM. Direct memory is introduced in Java NIO (New Input/Output) class, based on Channel and Buffer I/O mode, can directly allocate out-of-heap memory, and then through the heap DirectByteBuffer object as memory reference.

Create an object

Now that you know about Java’s memory regions, here’s how to create objects, allocate memory, and so on in Java’s memory regions.

Class symbol reference lookup

When the JVM encounters a new instruction, the JVM first checks to see if it can locate a symbolic reference to a class in the constant pool, and to see if the class represented by the symbolic reference has been loaded, parsed, and initialized. If not, the classloading process is performed first.

Class loading process

Here’s how to load a class.

loading

Loading is a phase of the class loading process that does three things:

Gets the binary byte stream that defines a class by its fully qualified name.
Transform the static storage structure represented by this byte stream into the runtime data structure of the method area.
Generate a java.lang.Class object in memory that represents the Class and acts as an access point for the Class’s various data in the method area.

In the case of arrays, the array itself is not created by the class loader; it is created directly by the JVM, but the element types of array classes are ultimately created by the class loader.

validation

Verification is the first step in the connection phase. The purpose is to ensure that the information contained in the byte stream of the Class file meets the requirements of the current VIRTUAL machine and does not compromise the security of the virtual machine. In the verification phase, the following four verification actions will be roughly completed:

File format validation: Verifies that the byte stream complies with the Class file format specification and can be processed by the current version of the VIRTUAL machine. Whether to start with magic 0xCAFEBABE; Check whether the major and minor versions are within the processing range of the VM. Whether there are unsupported output types in the output of the constant pool; Does any of the various index values pointing to yields point to nonexistent constants or yields that do not match a type…
Metadata verification: Mainly conducts semantic analysis on the information described by bytecode to ensure that the information described conforms to the Java language specification: whether this class has a parent class (all classes except java.lang.Object have parent classes); Whether the parent class is not allowed to be inherited; If not abstract, whether all methods required in the parent class or interface are implemented; Does a field or method in a class conflict with its parent class?
Bytecode validation: The most complex phase of determining that program semantics are legitimate and logical through data flow and control flow analysis.
Symbol reference validation: To ensure that the parsing action is performed properly: the fully qualified name described in the string in the symbol reference can find the corresponding class; Specify that the class has field descriptors that match methods and methods and fields described by simple names; Is the accessibility of classes, fields, methods in symbolic references accessible to the current class…

To prepare

The process of formally allocating memory for class variables and setting their initial values. The memory used by these variables is allocated in the method area. Only class variables (static modified variables) are assigned, not instance variables, and the initial value assigned is usually zero. However, if a final value is set, it is initialized to the value specified by final.

parsing

The parsing phase is the process by which the virtual machine replaces symbolic references in the constant pool with direct references. The similarities and differences between symbolic and direct references are as follows:

Symbolic references: SYMBOLIC references describe the referenced object as a set of symbols, which can be any literal. Symbolic references are independent of the memory layout implemented by the virtual machine, and the target of the reference is not necessarily already loaded into memory. Virtual machine implementations can have different memory layouts, but acceptable symbolic references must be consistent.
Direct reference: A direct reference can be a pointer to a target, a relative offset, or a handle that can be located directly to the target. Direct references are related to the virtual machine memory layout. If there is a direct reference, the target of the reference must already exist in memory.

The parse action is mainly for class or interface, field, class method, interface method, method type, method handle, and call point qualifier 7 class symbol references.

Initialize the

Initialization is the last step in the class loading process. During the initialization phase, class variables and other resources are initialized according to a programmatic subjective plan that executes the class constructor < Clinit > method. < Clinit > is introduced as follows:

Generated by combining the assignment action of all class variables in a class that the compiler automatically collects with statements in a static statement block, depending on the order in the source file.
Unlike constructors, there is no need to explicitly call the parent class’s constructor; the virtual machine is guaranteed to be in the subclass<clinit>Before execution, the parent class’s<clinit>Method and execution is complete.
Static statement blocks in the parent class precede those in the child class.
<clinit>It is not required for a class or interface, and may not be generated if there are no static blocks in the class and no assignments to variables<clinit>Methods.
An assignment operation that cannot use a static statement block in an interface but is still initialized by a variable, so the interface is also generated<clinit>, but the interface does not need to execute the parent interface first<clinit>Method, the parent interface is initialized only when a variable defined in the parent interface is used.
The virtual machine guarantees a class of<clinit>Methods are properly chained and synchronized in a multithreaded environment. If multiple threads initialize a class at the same time, only one thread will execute the class’s classes<clinit>Method, all other threads block and wait.

Class loader

The class loader implements the loading of a class, and both the class loader and the class itself determine the uniqueness of the class in the Java virtual machine.

Parental delegation model

For the JVM, there are only two different class loaders: the boot class loader and all the other loaders. For Java developers, class loaders can be a little more nuanced:

Start the class loader: takes care of the loading<JAVA_HOME>\libor-XbootclasspathParameter, and is the virtual machine-identified class library.
Extension class loader: Takes care of loading<JAVA_HOME>\lib\extIn the directory orjava.ext.dirsAll class libraries in the path specified by the system variable.
Application ClassLoader: determined by the return value to the SystemClassLoader() method in the ClassLoader, commonly called the SystemClassLoader, loads the class libraries specified on the user’s classpath.

The following diagram shows the hierarchical relationship of class loaders, called the parent delegate model:

Parents delegate the working process of the model is: if the class loader received the request of the class loading, not himself first attempts to load the class, but to delegate the request to the parent class loader to accomplish, start until you reach the top of the class loader, only the parent class feedback unable to load the loader will try to load. The advantage is that Java classes come with a hierarchy of priorities along with the classloader. In this way, system classes are not loaded by custom classes, ensuring security and order.

Break the parent delegate model

For example, a typical SPI code is:

Abstract modules in our system often have many different implementation schemes, such as log module, XML parsing module, JDBC module and so on. In object-oriented design, we generally recommend interface programming between modules rather than hard-coding implementation classes between modules. Once a specific implementation class is involved in the code, it violates the principle of pluggability, and if an implementation needs to be replaced, the code needs to be changed. A service discovery mechanism is needed in order to realize the dynamic specification during module assembly.
The Java SPI provides this capability: a mechanism to find a service implementation for an interface. Similar to the IOC idea of moving control of assembly out of the program, this mechanism is especially important in modular design.

As a result, Java introduced a ththread context classloader, which can be set through the setContextClassLoader method of the java.lang.Thread class.

Memory allocation

The memory required by Java objects is fully determined after the class is loaded. The following figure assumes that memory in the Java heap is perfectly neat, with all used memory on one side and free memory on the other, with a pointer in the middle as a pointer to the dividing point. Allocating memory means moving the pointer to free memory by an amount equal to the size of the object. This allocation is called pointer collisions.

But if the memory in the Java heap is not tidy, and used memory and free memory are interlaced, the virtual machine must maintain a list of which memory blocks are available, find a chunk of the list that is large enough to divide into object instances at allocation time, and update the list. This type of allocation is called a free list. Whether the Java heap is clean or not is usually relevant to the garbage collector, which is clean if the garbage collector has collation capabilities. Otherwise it’s not. Another key thing to consider about memory allocation is that memory allocation is a very frequent event, so in the case of concurrency, you need to be thread safe. There are two ways to solve the problem:

Synchronize memory allocation: The CENTRAL Authentication Service (CAS) and retry are used to ensure atomicity of update operations.
The Allocation of memory is divided into different Spaces by Thread. In other words, each Thread allocates a small chunk of memory in the Java heap, called Thread Local Allocation Buffer (TLAB). Whichever thread wants to allocate memory is allocated on the TLAB of that memory, and synchronization locking is only required when the TLAB runs out and new TLabs are allocated.

Escape analysis

The basic behavior of escape analysis is analyzing object dynamic scope. When an object is defined in a method, it may be referenced by an external method, such as passing as a call parameter to another method, called method escape. It may even be accessed by an external thread, such as an instance variable assigned to a class variable or accessible from another thread. This is called thread escape. If it is proved that an object cannot escape from a method or thread, that is, other methods or threads cannot access the object, it is possible to do some efficient optimizations for this variable:

On-stack allocation: Allocating the memory space of an object on the stack and destroying the object as the method call ends, reducing the heap footprint and garbage collector’s work.
Synchronization elimination: This variable can eliminate all synchronization measures because it cannot be accessed by other threads.
Scalar replacement: A scalar is a piece of data that can no longer be broken down into smaller pieces of data. Primitive data types in Java are scalars. Otherwise, if the data can be decomposed, it is called an aggregate quantity. The shredding of a Java object and its used member variables to restore primitive type access is called scalar substitution. If the object is not externally accessible and the object can be disassembled, it is possible to create its member variables directly instead of creating the object. In this way, member variables can be allocated and read and written on the stack, and further optimization methods can be created.

Memory allocation rules

Later in this series, you’ll see how the heap is typically divided into Eden zones and two Survivor zones in Java. Here’s how Java allocates memory.

1. Objects are allocated in Eden preferentially

In general, objects are allocated in the Eden region of the new generation, but when there is not enough space in the Eden region, the virtual machine will initiate a Minor GC.

2. Big objects go straight to the old age

Large objects are Java objects that require a large amount of contiguous memory. Virtual machine can pass – XX: PretenureSizeThreshold parameter is greater than the value of objects directly in the old s allocation, avoid between Eden area and two Survivor area happened a lot of memory copy.

3. Long-lived objects will enter the old age

How does the virtual machine recognize which objects should be placed in the old era and which objects should be placed in the new generation: The virtual machine defines an object age counter for each object. If the object survives after Eden is born and passes the first Minor GC and can be accommodated by Survivor, it will be moved to Survivor space and the object age is set to 1. Each time an object passes a Minor GC, its age increases by 1, and when it reaches a certain age (15 by default), it is promoted to the old age.

4. Dynamic object age determination

If the total size of all objects of the same age in a Survivor space is greater than half of the size in a Survivor space, objects older than or equal to that age can enter the old age directly.

Memory initialization

After memory allocation, the VIRTUAL machine needs to initialize the allocated memory space to 0 (excluding the object header). If TLAB is used, this work process can be advanced to TLAB assignment time.

Object to set

Next, the JVM makes the necessary Settings for the object, such as which class the object is an instance of, how to find the metadata information about the class, the object’s hash code, the object’s GC generation age, and so on. This information is stored in the object header of the object. The object header can be set in different ways according to the current running status of the VM, for example, whether bias is enabled.

Calling the constructor

Once this is done, the new object is created, and Java calls the

method of the object to initialize it as the programmer wishes.

Object memory layout

Above we looked at how objects are created in Java, but we don’t know what objects are created. Here we will look at the memory layout of objects in Java. As shown in the figure below, objects can be divided into three areas in memory: object headers, instance data, and alignment padding.

The object header contains two parts of information, the first part is the object’s own running data, and the second part is the type pointer.

Object runtime data: This part is used to store hash codes, GC generation ages, lock flags, etc. In 32-bit and 64-bit VMS, it is 32-bit and 64-bit, respectively. It’s officially called the Mark Word. In fact, objects need to store more runtime data than can be recorded, but Mark Word is designed to reuse its own storage space based on the state of the object, as shown in the following table:

Store content	Sign a	state
Object hash code, object generation age	01	unlocked
A pointer to a lock record	00	Lightweight locking
Pointer to a heavyweight lock	10	Swell (heavyweight lock)
Empty, no information needs to be recorded	11	The GC tag
Bias thread ID, bias timestamp, object generation age	01	Can be biased

Type pointer: A pointer to class metadata that the virtual machine uses to determine which class the object is an instance of. Not all virtual machine implementations must retain type Pointers. If it is a Java array, you also need to record the length of the array.

The following instance data section is the valid information that the object actually stores, as well as the content of various types of fields defined in the program code. Both those integrated from a parent class and those defined in a subclass need to be logged. This section includes those integrated from parent classes and those defined in subclasses. The storage order is affected by the order in which vm allocation parameters and fields are defined in the source code. Fields of the same width are usually assigned together. For example, the default allocation strategy in the HotSpot VIRTUAL machine is Longs/Doubles, INTS, SHORTS/Charts, Bytes/Booleans, Oops (Ordinary Object Pointers). The alignment padding of the third part is not necessarily there and has no meaning, but serves as a placeholder.

Using the object

Object access location

After the object is created, the program is ready to use the object. Java Chinese uses reference data on the stack to manipulate objects on the heap. There are usually two ways to implement this reference: using a handle or a direct pointer. The figure above shows examples of two approaches.

Handle: A block of memory in the Java heap is used as the handle pool. Reference stores the address of the handle of the object, and the handle contains the specific address information of the instance data and type data of the object. The advantage is that reference stores a stable handle address. When the object is moved, only the instance data pointer in the handle will be changed, and reference itself does not need to be modified.
Direct pointer: In the direct pointer approach, the layout of the Java heap object must consider how to place the information about the access type data, and the object address is stored in reference. The advantage is faster access, saving the time cost of a pointer location.

Object methods

Data structure – stack frame

A stack frame is a data structure used to support virtual machine method invocation and method execution. It is the stack element of the virtual machine stack in the data area when the virtual machine is running. A stack frame stores information about a method’s local variogram, operand stack, dynamic linkage, and method return address. For each method, the process from invocation to completion corresponds to the process of a stack frame in the virtual machine from stack to stack. When compiling the Code, the size of the local variable table and the depth of the operand stack in the stack frame are all determined and written into the Code property of the method table. So how much memory a stack frame needs to allocate is not affected by the runtime variable data of the program, but only by the specific virtual machine implementation. The following figure shows an example of a stack frame.

The following describes the function and data structure of each part of the stack frame:

Local variable table: A set of variable value storage Spaces for method parameters and local variables defined within a method. The minimum unit of the local variable table is Slot. The VM specification specifies that each Slot can hold a Boolean, byte, CHAR, short, int, float, Reference, or returenAddress. In addition, the local variable table is on the thread stack and is thread-private and therefore thread-safe. A VM uses a local Slot table for indexing. The index value ranges from 0 to the maximum number of slots in a local Slot table.
Operand stack: last in first out stack.
Dynamic concatenation: A stack frame contains a reference to the method that the stack frame belongs to in the runtime constant pool. This reference is held to support dynamic concatenation in method calls.
Method return address: Once a method has started execution, there are only two ways to exit. The first method is return, and the second method is uncaught. After the method exits, you need to return to where the method was called before the program can continue executing. The process of method exit is actually equivalent to the process of stack frame exit, so the operations that can be performed during exit include: restoring the local variable table and operand stack of the upper method, pushing the return value into the operand stack of the caller’s stack frame, adjusting the VALUE of the PC counter to point to an instruction following the method call instruction, etc.

The method call

Method invocation is not method execution, and the only job of a method invocation is to determine the version of the method being invoked. Since the Class file is compiled without a join step, all method calls are stored in the Class file as symbolic references rather than the entry address of the method in the actual runtime memory layout. Therefore, a direct reference to the target method needs to be determined during class loading, or even during runtime.

parsing

The target methods in all method calls are symbolic references in a constant pool in the Class file, and some of these symbolic references are converted to direct references during the Class load parsing phase. This resolution is valid only if: Methods have a determinable invocation version before the program actually runs, and the invocation version of this method is immutable at runtime. Java conforms to the “immutable” compile time, the runtime to this requirement, mainly including the static method and private method two kinds big, the former is directly associated with the type, the latter cannot be accessed outside, which determines the characteristics of two kinds of methods they cannot through inheritance or other ways to rewrite the other version, so they are suitable for parsing in class loading phase.

The dispatch

Java is an object – oriented programming language with three basic features of object – oriented: inheritance, encapsulation, polymorphism. The dispatch invocation process shows some basic features of polymorphism.

Static assignment: When a VM is reloaded, it is determined by the static types of parameters rather than the actual types. That is, the types specified by the object, such as the parent class, interface, and so on, rather than the implementation types, such as the subclasses. So when a method is overloaded, the method is looked up and called based on the parameter type. Static dispatch occurs only at compile time, so this action is not performed by the virtual machine.
Dynamic dispatch: closely related to Override, another important manifestation of polymorphism. Dynamic dispatch requires the determination of the actual type of call.
Single dispatch and multiple dispatch: The receiver of a method and the method parameters are collectively called the method arguments. Dispatches can be divided into single dispatches and multiple dispatches depending on how many cases the dispatches are based on. Single dispatch selects the target method based on one case, while multiple dispatch selects the target method based on more than one case.
Realization of virtual machine; Implementation dispatching varies in virtual machines. A common approach is to create a virtual method table for a class in the method area, using virtual method table indexes instead of metadata lookups to improve performance.

multithreading

Thread based

Thread realize

Threads are implemented in three ways:

Implementation using Kernel threads: Kernel-level threads (KLT) are directly supported by the operating system Kernel (Kernel). The kernel schedules threads by manipulating the scheduler and maps threads’ tasks to various processors in a complex way. Each kernel thread can be regarded as a dopant of the kernel, so that the operating system is capable of handling multiple monitors at the same time. A kernel that supports multithreading is called a multithreaded kernel. Programs typically do not use kernel threads directly, but instead use a high-level interface of kernel threads called light-weight processes (LWP), which are threads in the usual sense. Since each lightweight thread is supported by a kernel thread, there can be no lightweight process until kernel threads are supported first. This 1:1 relationship between lightweight processes and kernel threads is called the one-to-one threading model. Because of kernel thread support, each lightweight process becomes an independent scheduling unit, and even if one of the lightweight processes blocks in a system call, the entire process will continue to work. The limitations of lightweight processes are as follows: Due to the implementation based on kernel threads, various thread operations need to make system calls, which are costly and need to switch back and forth between user mode and kernel mode. Second, each lightweight process requires the support of a kernel thread, so lightweight processes consume a certain amount of kernel resources, so the system supports a limited number of lightweight processes. The following diagram shows the 1:1 relationship between lightweight processes and kernel threads:

Using User Thread implementation: Broadly speaking, a Thread that is not a kernel Thread can be considered a User Thread (UT), so lightweight processes can also be User threads, but the implementation of lightweight processes is always based on the kernel. In a narrow sense, user thread refers to a thread library completely established in user space. The system kernel cannot perceive the existence of threads. The establishment, synchronization, destruction and scheduling of user threads are completely completed in user mode, without the help of the kernel. The advantage is that there is generally no need to switch to kernel mode, fast operation and low consumption. This 1:N relationship between a process and a user thread is called the one-to-many threading model. The advantage of using user threads is that there is no kernel support, and the disadvantage is that there is no kernel support, so all thread operations need to be handled by the user program. It also makes it very difficult to deal with problems such as “blocking”. The following diagram shows the 1:N relationship between a process and a user thread:

Hybrid implementation of user threads and lightweight processes: In addition to the above two approaches, there is also an implementation that uses kernel threads with user threads. Under a hybrid implementation, there are both user threads and lightweight processes. User threads are still built entirely in user space, so user threads are still cheap to create, switch, and destruct, and can support large-scale user thread concurrency. The lightweight process supported by the operating system acts as a bridge between the user thread and the kernel thread. In this way, the thread scheduling function and processor mapping provided by the kernel can be used, and the system call of the user thread is completed through the lightweight thread, greatly reducing the risk of the whole process being completely blocked. In this hybrid mode, the ratio of user threads to lightweight processes is variable, i.e., N:M. The following figure shows the relationship between N:M:

Thread scheduling

1. Scheduling mode

Thread scheduling refers to the process in which the system allocates processor rights to threads. There are two main scheduling methods: cooperative thread scheduling and preemptive thread scheduling.

Collaborative scheduling: the execution time of the thread is controlled by the thread itself. After the thread completes its own work, it should actively notify the system to switch to another thread. The biggest advantage of the collaboration is that it is easy to implement, and since the thread does not schedule the switch until it has finished its own work, the switch is known to the thread itself. However, the execution time of threads under cooperative scheduling cannot be controlled and may be blocked.
Preemptive scheduling: threads are allocated execution time by the system, and thread switching is not determined by the thread itself. The system can control the execution time of threads, and will not cause the problem of process blocking due to threads. The thread scheduling method used in Java is called preemptive scheduling.

2. State transition

The Java language defines five thread states, and at any point in time, there is one and only one of them:

new
run
Wait indefinitely
The deadline for
blocking
The end of the

The following figure shows the thread state transition relationship:

Thread synchronization and concurrency

Java memory model

1. Main memory and working memory

Memory model refers to the process abstraction of reading and writing access to a particular memory or cache under a particular operating protocol. In fact, in physical computers, there is also the problem of concurrency, the problem of “data consistency” between elements in main memory and caches in various cpus. An attempt is made to define a Java memory model in the JVM to mask differences in memory access across hardware and operating systems in order to achieve consistent memory access across platforms.

The figure above is a diagram of the Java memory model, showing the interaction between threads, main memory, and working memory, as well as the location of the memory interactions. The main goal of the Java memory model is to define the rules for accessing variables in a program, namely the low-level details of storing variables (instance fields, static fields, and elements that make up arrays, excluding thread-private local variables and method parameters) into and out of memory in the virtual machine. Java specifies that all variables are stored in main memory, and that each thread has its own working memory. The thread’s working memory holds a main memory copy of variables used by the thread. All operations by a thread on a variable must be done in working memory, rather than directly reading or writing variables in main memory. Different threads cannot directly access variables in each other’s working memory, and the transfer of variable values between threads needs to be completed through the main memory.

2. Interoperate between dimMs

The specific protocol of interaction between main and working memory is the implementation details of how a variable is copied from main memory to working memory and synchronized from working memory back to main memory. Java’s memory model defines eight atomic operations:

Lock: A variable acting on main memory that identifies a variable as a thread-exclusive state.
Unlock: A variable that acts on main memory. It frees a variable that is locked before it can be locked by another thread.
Read: A variable acting on main memory that transfers the value of a variable from main memory to the thread’s working memory for subsequent load action.
Load: A variable operating on main memory that places the value of the variable from main memory into a copy of the variable in working memory.
Use: variable applied to working memory that passes the value of a variable in working memory to the execution engine. This operation is performed whenever the virtual machine reaches a bytecode instruction that requires the value of the variable to be used.
Assign: a working memory variable that assigns a value received from the execution engine to the working memory variable. This operation is performed whenever the virtual machine accesses a bytecode instruction that assigns a value to the variable.
Store: Variable applied to working memory that passes the value of a variable in working memory to main memory for subsequent write operations.
Write: a variable operating on main memory that places the value of a variable in main memory obtained from a store operation in working memory.

If you want to copy a variable from main memory to working memory, read and load are performed sequentially. If you want to synchronize variables from working memory back to main memory, store and write are performed sequentially. The Java memory model only requires that these two operations be performed sequentially, not in a row. The Java memory model also specifies that the following rules must be met on execution:

One of the read and load, store and write operations is not allowed to occur separately, that is, a variable is not allowed to be read from main memory but not accepted by working memory, or a write is initiated from working memory but not accepted by main memory.
A thread is not allowed to discard its most recent assign operation, which means that after a variable has changed in working memory, it must synchronize the change back to main memory.
A thread is not allowed to synchronize data from the thread’s working memory back to main memory for no reason.
A new variable can only be created in main memory. It is not allowed to use an uninitialized variable in working memory. In other words, a variable must be assigned and loaded before it can be used in use or store.
A variable can be locked by only one thread at a time. However, the lock operation can be repeated by the same thread several times. After the lock operation is performed several times, the variable can be unlocked only after the same number of UNLOCK operations are performed.
If you perform a lock operation on a variable, the value of the variable will be emptied from working memory, and the load or assign operation will be re-performed to initialize the value before the execution engine can use the variable.
It is not allowed to unlock a variable that has not been locked by a lock operation, nor is it allowed to unlock a variable that has been locked by another thread.
Before an unlock operation can be performed on a variable, the variable must be synchronized back to main memory.

3. Special rules for volatile variables

There are two characteristics of volatile variables:

Ensure that this variable is visible to all threads. That is, when one thread changes the value of the variable, the new value is immediately known to other threads. Volatile guarantees visibility only, so locking can be used to guarantee atomicity only in two cases: the result of an operation is not dependent on the current value of a variable, or it can ensure that only a single thread changes the value of a variable; Variables do not need to participate in invariant constraints with other state variables.
Disallow instruction reordering optimization. A memory barrier was added so that subsequent instructions could not be reordered to the location before the barrier.

The following are special rules for volatile variables. Assuming that T represents a thread and V and W represent two volatile variables, read, load, use, assign, Store, and write operations must comply with the following rules:

Thread T can use V only if it loads the previous action on V. Also, thread T can load variable V only if the next action performed by thread T on variable V is use. The use action of thread T on variable V can be considered to be associated with the load and read actions of thread T on variable V, and must appear consecutively. (This rule requires that each time V is used in working memory, the most recent value must be flushed from main memory to ensure that the value of V changed by other threads is visible.)
T can only execute the store operation on V if the previous operation on V is assign, and T can only execute the assign operation on V if the next operation on V is store. The assign action of thread T to variable V can be considered to be associated with the Store and write actions of thread T to variable V and must appear consecutively together. (This rule requires that every change to V in working memory must be immediately synchronized back to main memory to ensure that other threads can see their changes to V.)
Assume that action A is A use or assign action applied by thread T to variable V, that action F is A load or store action associated with action A, and that action P is A read or write action on variable V corresponding to action F. Similarly, assume that action B is a use or assign action applied by thread T to variable W, assume that action G fits the load or store action associated with action B, and assume that action Q is a read or write action corresponding to action G on variable W. If A comes before B, then P comes before Q. (This rule requires that volatile variables are not optimized for instruction reordering, ensuring that code is executed in the same order as the program.)

Special rules for long or double variables

For 64-bit long and double, the virtual machine is allowed to divide the read and write operations of the 64-bit data that is not volatile into two 32-bit operations. This allows the VIRTUAL machine to select the atomicity of the load, store, read, and write operations that do not guarantee the 64-bit data. That’s what’s called a nonatomic agreement for long and double. Most commercial virtual machines choose to treat 64-bit data reads and writes as atomic operations.

5. Atomicity, visibility and order

Atomicity: Atomic variable manipulation is directly guaranteed by the Java memory model, and access to basic data types is generally considered to be atomic. If you need a wider range of atomicity guarantees, you can use lock and unlock operations to meet this requirement, also known as synchronized.
Visibility: Visibility means that when one thread changes the value of a shared variable, other threads are immediately aware of the change. In addition to volatile, Java has two other keywords for visibility: synchronized and final.
Orderliness: The natural orderliness of Java programs can be summed up in a sentence: if you look inside this thread, all operations are ordered; If you observe another thread in one thread, all operations are out of order.

6. The principle of happening-before

In Java, there is a principle of antecedence, which is the main basis for determining whether data is competing and whether threads are safe. Based on this principle, we can solve all the problems of whether two operations may conflict in a concurrent environment through several rules. First of all, the antecedent principle refers to the partial order relationship between two operations defined in the Java memory model. If operation A precedes operation B, that is to say, the influence of operation A can be observed by operation B before operation B occurs. “Impact” includes changing the value of a shared variable in memory, sending a message, calling a method, and so on. Here are some “natural” antecedents in the Java memory model:

Sequence principle: In a thread, in order of program code, the operations written earlier (the order in the flow of control) take place before the operations written later.
Pipe lock rule: An UNLOCK operation occurs first after a subsequent (chronological) lock operation on the same lock.
Rule for volatile Variables: Writes to a volatile variable occur before reads to that volatile variable.
Thread start rule: The start method of the Thread object occurs first for each action of the Thread.
Thread termination rule: All operations in a Thread occur before the Thread terminates. We can detect that the Thread has terminated by the end of thread.join () method, the return value of thread.isalive (), and so on.
The Thread interrupt rule: A call to interrupt() occurs when the interrupted Thread’s code detects that an interrupt has occurred, which can be detected by thread.interrupted ().
Object finalization rule: An object’s initialization completes before its Finalize () method begins.
Transitivity: If A precedes B and B precedes C, then A precedes C.

Java lock

This chapter mainly introduces the locking mechanism used to ensure thread safety in Java multithreading. We will divide the locking mechanism in Java into two categories, namely mutex lock and shared lock. A mutex is the equivalent of a write lock. When a mutex is added to data, only the thread holding the lock can enter, and all other threads must block and wait. A shared lock is equivalent to a read lock. After a shared lock is added to data, only the threads that apply for or hold the shared lock can access the data. The main focus of this chapter is on Thread safety in Java, which can be implemented in many ways, including mutex synchronization with mutex, optimistic non-blocking synchronization, and no synchronization scenarios.

1. Mutually exclusive synchronization

Mutually exclusive synchronization is a common method to ensure concurrency correctness. Synchronization refers to ensuring that shared data is only used by one (or a few, when using semaphores) thread at a time when multiple threads concurrently access the data. The mutex is a means to realize synchronization, and the critical region, mutex and semaphore are the main ways to realize the mutex. Mutual exclusion is cause, synchronization is effect; Mutual exclusion is the method, synchronization is the destination. Synchronized works by adding monitorenter and Monitorexit bytecode instructions before and after a synchronized block. Both bytecode instructions require a reference parameter to specify the object to be locked and unlocked. If synchronized in a Java program specifies an object parameter, that is the object reference; If not explicitly specified, the corresponding object instance or Class object is taken as the lock object, depending on whether synchronized modifies an instance or Class method. In particular, the Synchronize block is reentrant for the same thread and does not lock itself up. Second, the synchronized block blocks subsequent threads until the incoming thread is finished executing. Since Java threads are mapped onto native threads of the operating system, if a thread is to be blocked or woken up, it needs to be executed by the operating system, causing the thread to transition from user state to core state, consuming processor time. Synchronization can also be achieved using a ReentrantLock in the java.util.concurrent (J.U.C) package. Both ReentrantLock and synchronized have thread reentrant features. The difference is that one is an API-level mutex and the other is a native syntactic mutex. Compared with synchronized, ReentrantLock adds some advanced features, including wait interruptible, fair lock, and multiple conditions:

Interruptible wait: When the thread holding the lock is not released for a long time, the waiting thread can choose to abandon the wait and do something else instead. Interruptible feature is very helpful for handling synchronous blocks with very long execution times.
Fair lock: Multiple threads waiting for the same lock must acquire the lock in the order in which the lock was applied.
Locks bind to multiple conditions: A ReentrantLock object can bind to multiple conditions simultaneously, whereas in synchronized, the wait() and notify() or notifyAll() methods of a lock object can implement an implied Condition.

2. Non-blocking synchronization

Mutex synchronization is pessimistic, while blocking synchronization is optimistic, which is an optimistic concurrency strategy based on conflict detection. In layman’s terms, the operation is performed first, and if no other threads compete for the shared data, the operation succeeds. If the shared data is contending and conflicts arise, other steps are taken. This optimistic approach to concurrency strategy does not suspend threads and is therefore called non-blocking synchronization. Performing a non-blocking synchronization operation usually requires only one processor instruction. Common examples of this type of instruction are:

Test and Set (test-and-set)
Fetch and Increment
Swap
Compare and Swap (CAS)
Load Linked/ store-conditional (LL/SC)

The CAS instruction requires three operands, which are the memory location (represented by V), the old expected value (represented by A), and the new value (represented by B). When the CAS instruction executes, the handler updates the value of V with the new value B if and only if V matches the old expected value A, otherwise it does not perform the update, but returns the old value of V whether or not it has updated the value of V. The above processing is an atomic operation. Java provides CAS manipulation functionality by providing several methods, such as compareAndSwapInt() and compareAndSwapLong(), on the Sun.misc.unsafe class. The Unsafe class, however, is not meant to be called by a user program. The Unsafe. GetUnsafe () code limits its ability to be queried only by the class that started the Unsafe loader, so it is often used indirectly using reflection or other Java apis, such as the integer atom classes in the J.U.C package. The disadvantage of CAS is the ABA problem, that is, if A variable V is first read with an A value and is still A value when it is ready to assign, does that mean that V has not been modified? No, so variables can be changed to other values and then changed back to a. The key to solving this problem is to provide A mechanism for version numbers.

3. No synchronization scheme is available

Synchronization is not necessary to be thread-safe. Some code is naturally thread-safe because there is no shared data involved:

Reentrant code: Pure code, reentrant code is independent of heap data and common system resources, uses state quantities passed in from parameters, does not call non-reentrant methods, etc.
Thread local storage.

Lock the optimization

Many lock optimization techniques are implemented in Java for efficient concurrency, such as adaptive spin, lock elimination, lock coarser, lightweight locking, and biased locking. Here we introduce the various locks.

When multiple threads are competing for a lock and there are multiple processors, ask subsequent threads to wait, but do not give up processor execution time, to see if the thread holding the lock will release the lock soon. Spin wait by itself is not a substitute for blocking, which requires not only the number of processors but also processor time. If the lock is held for a long time, the spin thread consumes processor resources. Adaptive spin locks in Java can be determined based on the previous spin time on the same lock and the state of the lock owner.

2. Lock elimination Lock elimination refers to the elimination of locks that require synchronization on code but are detected to be impossible to compete for shared data while the virtual machine just-in-time compiler is running. The primary determination of lock elimination is based on data support from escape analysis. If it is determined that all data on the heap in a piece of code cannot escape and be accessed by other threads, it can be treated as if it were data on the stack that is thread private, and synchronization locking is not necessary.

3. Lock coarsening When writing code, it is always recommended to keep the scope of synchronized blocks very small. If a series of operations repeatedly lock and unlock the same object, or even lock operations occur in the body of the loop, frequent mutex synchronization can lead to unnecessary performance losses, even if there is no thread contention. When the virtual machine detects that a string of fragmented operations are locking the same object, lock coarsening is performed, extending lock synchronization outside the entire sequence.

The traditional locking mechanism is called heavyweight locking, as opposed to the traditional locking mechanism, which uses system mutex. Lightweight and heavyweight locks are not substitutes. Lightweight locks are intended to reduce the performance cost of traditional heavy locks using operating system mutex without multi-threaded competition.

In the execution of lightweight lock, if the code enters the synchronized block and the synchronized object is not locked, the virtual machine will first establish a space called lock record in the stack frame of the current thread, which is used to store the copy of the product of the Pakistan product product (namely, the copy of the product of the Pakistan product product).

The virtual machine then uses the CAS action to try to update the object’s Mark Word to a pointer to the Lock Record. If the update action succeeds, the thread owns the lock on the object, and the object’s Mark Word lock bit changes to “00”, indicating that the object is in a lightweight lock state. This process is shown below:

If the update fails, the virtual machine first checks whether the Mark Word of the object refers to the stack frame of the current thread. If it only indicates that the current thread already owns the lock of the object, it can directly enter the synchronization block to continue execution. Otherwise, the lock object has been preempted by another thread. If there are more than two threads competing for the same lock, the lightweight lock is no longer valid. Instead, the status value of the lock flag changes to “10”. The pointer to the heavyweight lock (mutex) is stored in the Mark Word, and the thread waiting for the lock also blocks. The above description is the process of adding lightweight lock, and its unlocking process is also carried out by CAS operation. If the object’s Mark Word still points to the lock record of the thread, the product’s current Mark Word and copied in the thread can be replaced by CAS operation. If the replacement succeeds, the synchronization process is complete. If the replacement fails, another thread has attempted to acquire the lock, and the suspended thread must be awakened at the same time the lock is released. Lightweight application synchronization performance is improved on the basis that “for the vast majority of locks, there is no contest for the entire synchronization cycle”. If there is competition, the overhead of lightweight locks is slower than that of traditional heavyweight locks.

Biased locking The purpose of biased locking is to eliminate data synchronization primitives in the case of no competition, and further improve the performance of the program. If a lightweight lock uses a CAS operation to eliminate synchronized mutex without contention, a biased lock eliminates the entire synchronization without contention, not even the CAS operation. A biased lock means that the lock is biased in favor of the first thread to acquire it. If the lock is not acquired by another thread during subsequent execution, the thread holding the biased lock will never need to synchronize again. In biased locking, when the lock object is first acquired by a thread, the virtual machine sets the flag bit in the object header to “01”, that is, biased mode. At the same time, the CAS operation is used to record the ID of the thread that obtained the lock in the Mark Word of the object. If the CAS operation is successful, the VM can no longer perform any synchronization operation every time the thread that holds the biased lock enters the lock related synchronization block. The bias mode ends when another thread attempts to acquire the lock. According to whether the lock object is currently in the locked state, undo bias to restore to the unlocked or lightweight locked state.

Remove the object

Accessibility analysis

In Java, the reachability analysis algorithm is used to determine whether objects are alive or not. For objects that are dead or have no references to access, the space occupied by them should be cleaned up and recycled for other use. The basic idea of the algorithm is to search down from a series of “GC Roots” objects as the starting point. The search path is called the reference chain. When an object is not connected to GC Roots by any reference chain, it is proved that the object is not available. In the Java language, objects that can be used as GC Roots include the following:

The object referenced in the virtual machine stack (the local variable table in the stack frame).
The object referenced by the class static property in the method area.
The object referenced by the constant in the method area.
Objects referenced by JNI in the local method stack.

The concept of references is also extended in Java because some allocated space is kept in memory when enough memory is available, and objects can be discarded if memory is too tight after garbage collection. Caching features in many systems fit this scenario. The concepts referenced in Java are expanded from strong to weak:

Strong references: Pervasive in program code, the garbage collector will never reclaim the referenced object as long as strong references exist.
Soft references: Used to describe objects that are useful but not necessary. Objects associated with soft references are listed in the collection scope for a second collection before the system is about to run out of memory. An out-of-memory exception is thrown if there is not enough memory for this collection.
Weak references: Describes non-essential objects, which are weaker than soft references and only survive until the next garbage collection. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether there is currently enough memory.
Phantom reference: phantom reference/phantom reference is the weakest reference relationship. A virtual reference to an object does not affect its lifetime, nor can a virtual reference be used to obtain an object instance. The sole purpose of an object virtual reference is to receive a system notification when the object is collected by the garbage collector.

The death process of an object has to go through two labeling processes: if there is no reference chain connected with GC Roots after the reachabability analysis, it will be labeled for the first time and screened once, and the screening condition is whether it is necessary to implement finalize method for the object. When objects do not overwrite finalize methods or Finalize methods have been called by virtual machines, it is considered unnecessary to execute finalize methods. If an object is deemed necessary to execute, it will be placed ina queue called f-Queu and executed later by a low-priority Finalizer thread automatically created by the virtual machine. Execution means that the virtual machine fires the method, but does not promise to wait for it to finish. The reason is that the memory reclamation system may crash if objects are executed slowly or even inan infinite loop in finanlize. Finanlize is the objects’ last chance to escape death. Later, the GC will make a second small token of the objects in the F-Queue. If the objects want to save themselves successfully in Finalize, they only need to re-associate with any of the objects in the reference chain. Note that the finalize method of any object will be executed by the system only once.

Recycling method area: the garbage collection efficiency of the method area is low, but it does not mean that the method area is not completely recycled. The garbage collection of the permanent generation mainly recycles two parts: discarded constants and useless classes. Recycling deprecated constants is very similar to recycling objects in the Java heap. However, the criteria for determining whether a class is “useless” are much harsher. A class must meet all three of the following criteria to be considered “useless” :

All instances of the class have already been reclaimed, meaning that there are no instances of the class in the Java heap.
The ClassLoader that loaded the class has been reclaimed.
The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.

Garbage collection algorithm

Mark clearing algorithm

The most basic collection algorithm is the “mark-clean” algorithm, which is divided into two stages: “mark” and “clear” : first, mark all the objects to be recycled, after the completion of the mark all the marked objects. The disadvantages are:

Low efficiency, low labeling and cleaning efficiency;
Space problem, a large number of discrete memory fragments will be generated after the flag is cleared.

Replication algorithm

To solve efficiency problems, the “copy” algorithm divides the available memory into two equally sized pieces by capacity, using only one piece at a time. When this block of memory is used up, surviving objects are copied to the other block, and the used memory space is cleaned up at a time. In this way, the entire half of the region is reclaimed each time, and the problem of memory fragmentation is not considered when allocating memory. Today’s virtual machines rarely split memory in half, otherwise it would be too wasteful. Typically, memory is divided into one large Eden space and two smaller Survivor Spaces, using Eden space and one Survivor at a time. When recycling is done, the surviving objects in Eden and Survivor are copied to another Survivor once, and Eden and the Survivor space that was just used are cleaned up. Allocation guarantee: When Survivor space is insufficient, need to rely on other memory (old) allocation guarantee. If another Survivor does not have enough space to hold the surviving objects collected by the previous generation, these objects will go directly to the old age through the allocation guarantee mechanism.

Tag sorting algorithm

The process of the mark-up algorithm is similar to that of “mark-up cleanup,” except that instead of cleaning up the recyclable objects directly, the surviving objects are moved to one end and the memory beyond the boundary is cleaned up.

Garbage collector

The garbage collector is the embodiment of memory collection. Let’s look at several different garbage collectors.

Serial collector

A single-threaded collector, while collecting, must suspend all other worker threads until it collects the order, thus causing the application to stop.
The advantage is that it is simple and efficient, with the highest single-thread collection efficiency.
The new generation adopts copy algorithm, and the old generation adopts tag sorting algorithm.

ParNew collector

A multithreaded version of the Serial collector is not necessarily more efficient than the Serial collector.
The new generation adopts copy algorithm, and the old generation adopts tag sorting algorithm.

Parallel avenge

Cenozoic collector
Replication algorithm
Multithreaded collector
The goal is to achieve a manageable throughput (the ratio of the time the CPU spends running user code to the total CPU consumption time, i.eThroughput = time to run user code/(Time to run user code + garbage collection time)). High throughput can make efficient use of CPU time and complete the operation tasks of the program as soon as possible, which is mainly suitable for the operation in the background without too much interaction.
Parameters for precise throughput control: -xx :MaxGCPauseMillis; -xx :GCTimeRatio; -xx :+UseAdaptiveSizePolicy; Virtual machine based on the system operation performance information collection, the dynamic regulation of new generation (- Xmn), Eden and Survivor area ratio (- XX: SurvivorRatio), promotion of age old s (- XX: PretenureSizeThreshold), and other details, This is called the GC adaptive tuning strategy.

Serial Old collector

An older version of the Serial collector that uses the tag collation algorithm.

Parallel Old collector

The Parallel Avenge collector, an older version of the Multithreaded, tag collation algorithm.

CMS collector

A collector whose goal is to obtain the shortest collection pause time
The process is as follows:

Initial tag: Marks objects that GC Roots can directly associate with
Concurrent markup: The process of GC Tracing
Relabelling: Corrects the markup record of the portion of the object whose markup changes because the user program continues to operate during concurrent markup
Concurrent clear: Concurrent clear

Disadvantages are as follows:

The CMS collector is very sensitive to CPU resources. Concurrent programs are CPU – sensitive.
Floating garbage cannot be handled and a “Concurrent Mode Failure” may occur resulting in another Full GC. The CMS collector cannot collect floating garbage because the concurrent cleanup phase user threads are still running and generating garbage. A Concurrent Mode Failure occurs when the memory set aside during the CMS run does not meet the requirements of the program, and the Serial Old collector is temporarily enabled on the VIRTUAL machine to restart the Old garbage collection, resulting in a longer pause time.
CMS is based on tag scavenging, and at the end of the collection there is so much space debris that large object allocations cannot be made that the Full GC has to be triggered. In order to solve this problem, a CMS offers – XX: + UseCMSCompactAtFullCollection switch parameters, used to hold the CMS collector FullGC open memory fragments merging sorting process, although there is no the problem of space debris, but unable to concurrency, pause time will be longer.

G1 collector

The characteristics of

Parallelism vs. concurrency: Use multiple cpus to shorten stop-the-world pauses
Generational collection: New objects are handled differently from old objects that have been around for a while and survived multiple GC’s
Space integration: Overall, it is based on the tag collation algorithm, and from the perspective of Region, it is based on the replication algorithm. After garbage cleaning, no memory space fragment will be generated, and regular available memory can be provided after collection
Predictable pauses: Model predictable pauses

The G1 collector manages the memory layout of the Java heap differently from other collectors in that the G1 collector divides the Java heap into independent regions of equal size. The new and old generations are no longer physically separated, but are collections of partial regions (not necessarily contiguous). G1 tracks the value of garbage accumulation in each Region (the amount of garbage collection space obtained and the experience value of garbage collection time), maintains a priority list in the background, and collects garbage from the Region with the highest value according to the allowed collection time.
VMS in the G1 collector use Remembered Set to avoid full heap scans for object applications between regions and for object references between generations in other collectors. Each Region in G1 has a Write Barrier for writing data of the Reference type. The vm discovery program will generate a Write Barrier for writing data of the Reference type. Check whether the object referenced by Reference is in a different Region (in the generational example, check whether the object in the old age refers to the object in the new generation), if so, The referenced information is recorded in the Remembered Set of the Region to which the referenced object belongs through CardTable. However, in memory reclamation, Remember Set can be added to the enumeration range of the GC root node to ensure that no full heap scan will be missed.
Ignoring the operation of maintaining the Remembered Set, the G1 collector operates in the following phases:

Initial tag: Mark objects that GC Roots can be directly associated with, and change the value of TAMS (Next Top at Mark Start) so that the Next phase of user programs running concurrently can create new objects in the correct Region available. The thread needs to be paused at this stage, but only for a short time.
Concurrent marking: Reachability analysis of objects in the heap, starting with GC Root, to find viable objects, is a longer phase, but can be performed concurrently with the user thread.
The virtual machine will record the object changes in the thread Remembered Set during this time. This phase requires the thread to pause, but can be executed in parallel.
Filter collection: Sort the collection value and cost of each Region, and execute the collection plan according to the expected GC pause time of the user.

The full life cycle of a Java object

Memory area

Create an object

Class symbol reference lookup

Class loading process

loading

validation

To prepare

parsing

Initialize the

Class loader

Parental delegation model

Break the parent delegate model

Memory allocation

Escape analysis

Memory allocation rules

Memory initialization

Object to set

Calling the constructor

Object memory layout

Using the object

Object access location

Object methods

Data structure – stack frame

The method call

parsing

The dispatch

multithreading

Thread based

Thread realize

Thread scheduling

Thread synchronization and concurrency

Java memory model

Java lock

Lock the optimization

Remove the object

Accessibility analysis

Garbage collection algorithm

Mark clearing algorithm

Replication algorithm

Tag sorting algorithm

Garbage collector

Serial collector

ParNew collector

Parallel avenge

Serial Old collector

Parallel Old collector

CMS collector

G1 collector

Related Posts

Will the other threads run when the main thread terminates?

The nearest common ancestor of a binary search tree

Kafka Pit Guide -Kafka workflow analysis