CPU cache

CPU cache structure

Locality principle

When the CPU accesses memory, whether to access instructions or to access data, the memory units accessed tend to be clustered in a small contiguity area.

Time locality: If an information item is being accessed, it is likely to be accessed again in the near future. Program loops, stacks, and so on are the causes of time locality.
Spatial locality: Information that will be used in the near future is likely to be spatially adjacent to information that is being used.
Sequential locality: In a typical program, most instructions are sequential, except for the transfer instructions. The ratio of sequential to non-sequential execution is roughly 5:1. In addition, access to large arrays is sequential. Sequential execution of instructions and sequential storage of arrays are the causes of sequential locality.

CPU cache application scenarios

Array traversal follows a low-dimensional to high-dimensional order
Shared variables monopolize CPU cache rows, avoiding pseudo-sharing

Memory allocation

Every line of code we write takes up memory (the method area), the code occupies the stack or heap when it runs, and improper classloading mechanisms and logic can cause memory requisition and reclamation problems. How to gracefully allocate and recycle memory is the key to writing efficient code. What are the memory allocation strategies in Java?

Firing pin assignment

Firing pin allocation, also called linear allocation, is the sequential use of memory. However, after the memory has been used for a period of time, memory fragments will appear after garbage collection and become not very tidy. In this case, freeList is introduced.

FreeList

Freelist is used to record memory addresses that have been used and then freed. Next time an object is allocated, the appropriate size of memory is first found in the freelist for allocation, and then the pin is allocated in the main memory. Firing pin allocation is usually used in conjunction with freelist, which seems to solve the memory fragmentation problem, but does not solve the efficiency problem caused by CAS spin during concurrent memory application.

TLAB

In order to solve the conflict of multi-thread concurrent memory application, THE JVM proposed a memory pre-allocation strategy, which is to allocate a certain size of memory on the heap for threads in advance. If the memory is allocated in the thread, it will be allocated from TLAB first.

Class allocation procedure

Object allocation process

For allocation of new objects, escape analysis is carried out on the stack first, and allocation is attempted by scalar replacement

How objects are referenced in memory

The structure of an object in memory

Markword stores hashCode, GC generation age, lock status identification, etc. Alignment padding ensures that the object is a multiple of 8 bytes.

The Java view object size: ObjectSizeCalculator getObjectSize (obj)

Memory allocation

Look at this

Application level cache

The hash table

The application level cache will cache the hash table directly, and no more computation, fatigue, and time complexity O(1).

Hash conflict resolution: linked list and development addressing

Cache update strategy

If the amount of data is small, update the data in full each time and use reference replacement.
If there is a large amount of data, incremental updates are made using the realist replication principle

Zero copy

We usually do this: 4 copies, 4 switches

If the kernel directly copies the contents of the PageCache file to the Socket buffer after reading the file, and notifies the process after the nic sends the file.

We can do this: 2 toggles, 3 copies

If The NETWORK adapter supports The Scatter-Gather Direct Memory Access (SG-DMA) technology, you can remove The copy of The Socket buffer, so that there are only two Memory copies in total.

PageCache prefetch function

Pagecache is a disk cache that takes advantage of time locality by pre-reading a portion of data from disk into memory. However, if the Pagecache is refreshed every time a large file is read, resulting in a very low hit ratio, then preread can degrade performance.

Asynchronous I/o

Asynchronous IO and direct IO should be used instead of zero-copy techniques when processing large files in high-concurrency scenarios.

Asynchronous IO (which can handle both network IO and disk IO, but we’ll focus on disk IO here) can solve the blocking problem. It divides the read operation into two parts. The first part makes a read request to the kernel, but returns immediately without waiting for the data to be in place, while the process can concurrently process other tasks. When the kernel copies data from disk to the process buffer, the process receives notification from the kernel and processes the data, which is the second half of asynchronous IO. As shown below:

As you can see in the figure, asynchronous IO is not copied to PageCache. The IO that goes through the PageCache is called cache IO.

IO summary: BIO,NIO,AIO

Synchronous and asynchronous

Synchronization: Synchronization is when a call is made and does not return until the caller has processed the request.
Asynchrony: Asynchrony means that after making a call, we immediately get the response from the called to indicate that the request has been received, but the called does not return the result. At this time, we can process other requests. The called usually relies on events, callbacks and other mechanisms to notify the caller of the return result.

Blocking and non-blocking

Block: A block is when a request is made and the caller waits for the result of the request to return, meaning that the current thread is suspended and cannot continue until the condition is ready.
Non-blocking: Non-blocking is making a request so the caller doesn’t have to wait for the result to come back and can do something else first.

In short, synchronous asynchrony refers to how the message is sent to the caller, such as whether the order is picked up by the canteen or delivered by the waiter; Blocking non-blocking refers to whether the caller is suspended, such as whether the order is placed at the counter or whether they can sit and chat on their phone.

Look here

BIO: die, etc

NIO: Indicates whether the data is ready and then blocks the read. (Polling has line switching)

IO multiplexing (NIO) : Without rotation, the system tells data to continue and then blocks reads. (One switch, kernel polling)

AIO: Directly give data

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

A few tips for writing high-performance code

CPU cache

CPU cache structure

Locality principle

CPU cache application scenarios

Memory allocation

Firing pin assignment

FreeList