The document is in disrepair and some parts of the content may not be accurate. Read and cherish it

A small factory interview, record, the answer is wrong, please help correct next

Go part

Map Low-level implementation

Map is implemented through hash tables

Read more about MAP: Go Expert Programming – Map

Difference between Slice and array

An array is a fixed-length array. You must determine the length of the array before using it

Array characteristics:

  • The array of GO is a value type, that is, one array is assigned to another array, so it is actually a copy of the original array, requiring additional memory space
  • If the array in GO is used as an argument to the function, then the argument actually passed is a copy of the array, not a pointer to the array
  • The length of array is also part of Type, which means that [10]int is different from [20]int

Slice features:

  • Slice is a reference type that is a dynamic pointer to an array slice
  • Slice is an indeterminate data structure that always points to the underlying array array

The difference between:

  • When declaring: array needs to declare the length or…
  • As a function argument: array passes a copy of the array, slice passes a pointer

What is the difference between struct and OOP use

First OOP features: inheritance, encapsulation, polymorphism

inheritance

Concept: The process by which an object acquires properties of another object

  • Java has single inheritance and multiple implementations of interfaces
  • Go can implement multiple inheritance
    • A struct nested with another anonymous struct can have direct access to methods raised by the anonymous organization, thus achieving integration
    • When a struct is nested with another named struct, the pattern is called composition
    • A struct nested multiple anonymous structs, then the structure can directly access the methods of multiple anonymous structs, thereby implementing multiple inheritance

encapsulation

Concept: a self-contained black box with private and public parts that can be accessed and private parts that cannot be accessed externally

  • In Java, access control is controlled by keywords public, protected, private, and default
  • Go implements permission control by convention. The variable name is capitalized for public and lowercase for private. Access in the same package, equivalent to default. Since there is no inheritance in GO, there is no protected

polymorphism

Concept: Features that allow access to the same class of actions using an interface

  • Polymorphism in Java is passedextends classorimplements interfaceimplementation
  • The interface in GO passescontractAs long as a struct implements all methods in an interface, it implicitly implements that interface

Talk about your understanding of Channel

A channel is a communication mechanism that allows one Goroutine to send value messages to another goroutine. Each channel has a special type, which is the type of data a channel can send.

The state of a channel

A channel has three states:

  1. Nil, uninitialized state, just declared, or manually assigned to nil
  2. Active: a normal channel that can be read or written
  3. Closed

A channel can perform three operations:

  1. read
  2. write
  3. Shut down

These three operations and states can be combined into nine cases:

operation Nil channel Normal channel Closed channel
< – ch (read) blocking Success or block Read zero
Ch < – (write) blocking Success or block panic
Close (ch) panic successful panic

How does MAP keep threads safe in concurrent state

Go’s map concurrent access is not secure, and undefined behavior may occur, causing the program to exit.

Prior to GO1.6, the built-in map type was partially Goroutine safe, concurrent reads were fine, concurrent writes were possible. After go1.6, an error occurs when a map is read or written concurrently.

Compared with the implementation of Java ConcurrentHashMap, when the map data is very large, a lock will cause large concurrent clients to compete for a lock. The Java solution is shard, which uses multiple locks internally and shares one lock for each interval, thus reducing the performance impact of sharing one lock for data

Prior to GO1.9, it was common to use sync.rwmutex to control concurrent access to maps, or to use locks alone.

After go1.9, sync.Map is implemented, similar to Java’s ConcurrentHashMap.

The implementation of sync.map has several optimizations:

  1. Space for time. The impact of locking on performance is realized through redundant two data structures (read, DIRTY)
  2. Use read-only data (read) to avoid read/write conflicts
  3. Dynamic adjustment, upgrade dirty data to read after too many misses
  4. double-checking
  5. Delayed deletion. Deleting a key is just a marker, and the deleted data is only cleaned up when dirty is promoted
  6. Read, update, and delete from read preferentially because read does not require a lock

Talk about your understanding of GC

Memory management

To put it simply, the memory management implemented by GO is to maintain a large global memory, and each thread (P in GO) maintains a small private memory. If private memory is insufficient, it will apply from the global memory.

  • Go starts off by applying a large chunk of memory, and dividing spans, Bitmap, and Arena into regions
  • Areans are divided into small pieces on a page
  • Span manages one or more pages
  • McEntral manages multiple Spans for application by threads
  • McAche is a thread private resource that comes from McEntral

See Reference Note 1 for more instructions

The garbage collection

Common garbage collection algorithms:

  • Reference count: Maintains a reference count for each object, decrement it by one when the object referencing it is destroyed, and reclaim it when the reference count is zero.
    • Advantages: Objects can be reclaimed quickly without running out of memory or reaching a certain threshold.
    • Disadvantages: Does not handle circular references well, and maintains reference counts in real time at a cost.
    • Languages: Python, PHP, Swift
  • Mark-clean: Iterates through all referenced objects from the root variable, marking them as “referenced” and collecting unmarked objects.
    • Advantages: Solves the disadvantages of reference counting
    • Disadvantages: Need to Stop The World (STW), which is to Stop all goroutine, concentrate on garbage collection, wait for garbage collection to resume Goroutine, this causes The program to pause for a short time.
    • Representative language: Go (three-color notation)
  • Collection by generation: different generation Spaces are divided according to the life cycle of objects. Objects with long life cycles are put into the old generation, while objects with short life cycles are put into the new generation. Different generations have different collection algorithms and collection frequencies.
    • Advantages: Good recovery performance
    • Disadvantages: Complex recovery algorithm
    • Representative language: Java
Go Three-color labeling for garbage collection

Tricolor notation is just a way of describing things that can be abstracted from them. Objects don’t actually have colors. The tricolor here corresponds to the three states of the object during garbage collection:

  • Gray: The object is still waiting in the tag queue
  • Black: the object is marked and the corresponding bit of gcmarkBits is 1 (the object will not be cleaned in this GC)
  • White: the object is unmarked and the corresponding bit of gcmarkBits is 0 (the object will be cleaned up in this GC)

Garbage collection optimization2

Write Barriers

As mentioned earlier, STW’s purpose is to stop goroutine from changing memory during GC scans, and write barriers are a means to keep Goroutine running at the same time as GC. While write barriers do not eliminate STW completely, they can greatly reduce STW time.

Write barriers are a kind of switch that is turned on at a specific time in GC. When turned on, the pointer will be marked as passed, meaning that the pointer will not be collected in this cycle until the next GC.

Newly allocated memory is marked immediately during the GC, without using write barriers, meaning that memory allocated during the GC is not reclaimed in the current GC.

Mutator Assist (GC)

To prevent memory allocation too quickly, during GC execution, if a Goroutine needs to allocate memory, the Goroutine participates in the GC’s work, that is, helps the GC do some of the work, a mechanism called Mutator Assist.

Garbage collection trigger time3

When the memory allocation reaches the threshold, GC starts

Each time memory allocation is checked to see if the current memory allocation has reached the threshold, and GC is started immediately if the threshold is reached.

Threshold = last GC memory allocation x memory growth rateCopy the code

The memory growth rate is controlled by the environment variable GOGC, which defaults to 100, starting GC whenever memory is doubled.

Trigger GC periodically

By default, the longest 2 minutes trigger a GC, the interval in SRC/runtime/proc. Go: forcegcperiod variable be declared:

// forcegcperiod is the maximum time in nanoseconds between garbage
// collections. If we go this long without a garbage collection, one
// is forced to run.
//
// This is a variable for testing purposes. It normally doesn't change.
var forcegcperiod int64 = 2 * 60 * 1e9
Copy the code
Manual trigger

Runtime.gc () can also be used in program code to trigger GC manually, primarily for GC performance testing and statistics.

GC Performance Optimization

The GC performance is negatively correlated with the number of objects. The more objects, the worse the GC performance and the greater the impact on the program.

So one of the ideas for GC performance optimization is to reduce the number of objects allocated, such as object reuse or combining small objects with large objects.

In addition, some implicit memory allocations can occur due to memory escape, which can also become a burden to the GC.

Memory escape phenomenon 3.1: A variable allocated on the stack needs to be scoped by the compiler, otherwise it will be allocated on the heap. Dynamically allocating memory on the heap is much more expensive than statically allocating memory on the stack.

Go Observe variable escape by running the go build-gcflags =m command 4

More Escape scenes: Escape scenes

Functions of escape analysis:

  1. The benefit of escape analysis is that in order to reduce GC stress, non-escape objects are allocated on the stack, and resources are reclaimed when the function returns without GC tag cleanup.
  2. After escape analysis, we can determine which variables can be allocated on the stack. Stack allocation is faster and performs better than the heap (local variables that have escaped are allocated on the heap, while those that have not escaped are allocated on the stack by the compiler).
  3. Synchronization elimination, if you define an object that has a synchronization lock on a method, but only one thread is accessing it at runtime, then the machine code after escaping will run with the synchronization lock removed

Escape to summarize

  • Allocating memory on the stack is more efficient than allocating memory in the heap
  • Memory allocated on the stack does not require GC processing
  • Memory allocated on the heap is handed over to the GC when it is used
  • The purpose of escape analysis is to determine whether memory is allocated to the heap or stack
  • Escape analysis is completed at compile time

How does the go method differ from Python or Java

Reference documentation: Go parameter passing details

The parameter passing of the function in GO adopts value passing

gin

Talk about your understanding of gin

Gin is a go microframework with elegant packaging and API friendliness. Fast and flexible. Fault-tolerant and convenient.

In fact, go is far less dependent on web frameworks than Python, Java, etc. Net/HTTP itself is simple enough and performs very well, and most frameworks are high level encapsulation of NET/HTTP. So the GIN framework is more like a collection of common functions or tools. Development using gin framework improves efficiency and agrees with the team’s coding style.

Why is gin’s routing component high performance

The routing tree

Gin uses the high-performance routing library Httprouter5

In the Gin framework, routing rules are divided into nine prefix trees, one for each HTTP Method, and the nodes of the tree are hierarchically divided according to the/symbol in the URL

gin.RouterGroup

A RouterGroup is a wrapper around a routing tree and manages all routing rules. The Engine structure inherits from the RouterGroup, so the Engine directly provides all route management functions of the RouterGroup.

Gin data binding

Gin provides a convenient data binding function that automatically binds the parameters sent by the user to the structure we define.

That’s why I chose gin.

Gin data verification

Gin also provides methods for data validation in addition to the above data binding. Gin’s data validation is integrated with data binding. Just add a binding rule to the tag of the structure member variable of the data binding. This saves a lot of validation and is a perfect alternative framework for programmers who are used to ASPCore VC and Spring MVC.

Gin middleware

Gin middleware takes advantage of the last-in, first-out feature of function call stack to complete the processing operations of the middleware after the completion of custom processing functions.

redis

Why is Redis high performance

  • Pure memory operation, memory read and write speed is very fast.
  • Single thread 6, avoid unnecessary context switch and competition conditions, there is no multi-process or multi-thread caused by the switch and CPU consumption, do not have to consider the problem of various locks, there is no lock release lock operation, because there is no deadlock.
  • Efficient data structure, Redis data structure is specially designed
  • Use multiplex I/O multiplexing model, non-blocking IO
  • The use of the underlying model is different, between them the underlying implementation and communication with the client between the application protocol is not the same, Redis directly built their own VM mechanism, because the general system call system function, will need a certain amount of time to move and request;

Why is Redis single-threaded

Official answer: because Redis is a memory-based operation, CPU is not the bottleneck of Redis. The bottleneck of Redis is most likely the memory size of the machine or the network bandwidth. Since single-threading is easy to implement and the CPU is not a bottleneck, it makes sense to adopt a single-threaded solution

Multiplex I/O multiplexing model, non-blocking IO

Select, poll, and epoll under Linux do just that. One of the most advanced is epoll. Register FDS of user sockets with epoll, and ePoll helps you listen for incoming messages on those sockets, thus avoiding a lot of useless operations. The socket works in non-blocking mode. In this way, the process blocks only when epoll is called, sending and receiving client messages are not blocked, and the whole process or thread is fully utilized, which is called event driven.

Common 5 data structures

  • String: cache, counter, distributed lock, etc
  • List: linked List, queue, timeline List of followers on Weibo, etc
  • Hash: user information, Hash table, etc
  • Set: to delike, like, mutual friend, etc
  • Zset: ranking of page views and clicks, etc

How to ensure the reliability of Redis as a message queue

The RabbitMQ ACK mechanism is used to provide a reward for the purchase.

RabbitMQ

How does RabbitMQ ensure message reliability

The production end

There are two scenarios: transactional messages and message acknowledgement

Transactional messages can seriously degrade RabbitMQ performance, so they are rarely used. Therefore, an asynchronous message acknowledgement is usually used to ensure that the message is sent to RabbitMQ

The consumer end

Message acknowledgement (ACK) : When a Customer subscrires to a RabbitMQ message using autoAck=true, the message may be lost due to network problems, Customer’s processing of the message may be abnormal, or the server may be down.

When autoAck=true RabbitMQ automatically sets the sent messages to confirm and deletes them from memory (or disk) regardless of whether the consumer actually consumes them.

To avoid messages being lost in this case, RabbitMQ provides consumer confirmation for processing messages, so autoAck=false is required

MQ itself

These are all application level guarantees of message reliability, although this has greatly improved the security of the application, but RabbitMQ nodes restart, downtime and other situations can still cause message loss, so you need to set the queue persistence. Message persistence ensures that messages can be recovered after a node is down or restarted.

If a single point of problem occurs, the message will still be lost. Therefore, you can set up mirrored queues and clusters for critical messages to ensure high availability of message services.

MongoDB

MongoDB is a general purpose, document-oriented distributed database

MongoDB index data structure

WiredTiger, the default MongoDB engine, uses b-trees as the underlying data structure for indexing, but LSM trees are also supported as an optional underlying data storage structure in addition to B-trees.

MongoDB index is a special data structure. The index is stored in a data set that is easy to traverse and read. The index is a structure that sorts the values of one or more columns in a database table.

Why does MongoDB choose B tree by default instead of MySQL’s default B+ tree

The first is the application scenario:

  • As a non-relational database, MongoDB does not have as strong demand for traversal data as relational database. It pursues the performance of reading and writing single records
  • Most databases face the scenario of read more and write less. B tree and LSM tree have more advantages in this scenario

MySQL uses B+ tree because only leaf nodes of B+ tree store data. By connecting each leaf node of tree species with Pointers, sequential traversal can be realized. Traversal database is very common in relational database

The ultimate purpose of MongoDB and MySQL to choose between multiple different data structures is to reduce the random I/O times required by the query. MySQL believes that the query traversing data is very common, so it chooses B+ tree as the underlying data structure. However, MongoDB believes that it is much more common to query a single data record than to traverse data. Since the non-leaf nodes of B tree can also store data, the average random I/O times required to query a data are less than that of B+ tree, and the query speed of MongoDB using B tree is faster than that of MySQL in similar scenarios. This is not to say that MongoDB cannot iterate through the data, we can also use range queries in MongoDB to query a batch of records that meet the corresponding conditions, but it takes longer than MySQL.

MongoDB as a non-relational database, it uses a completely different method from the set design. If we still use the traditional table design ideas of relational database to think about the set design in MongoDB, the written query may bring relatively poor performance.

The recommended design approach in MongoDB is to use embedded document 7.

What are MongoDB indexes and what are the differences

MongoDB supports multiple types of indexes, including single-field indexes, composite indexes, multi-key indexes, and text indexes. Each type of index has different application scenarios.

  • ** Single-field indexes: ** is the most common type of index that speeds up various query requests for a given field. This is also the default id index created by MongoDB.
  • ** composite index: ** is an upgraded version of a single-field index, which creates an index for multiple fields by sorting the first field, documents with the same field by the second field, and so on.
  • ** Multi-key index: ** When the index field is an array, the created index is called a multi-key index. The multi-key index creates an index for each element of the array.
  • ** Hash index: ** is to build an index based on the hash value of a field. Currently, it is mainly used for the hash sharding of MongoDB Sharded Cluster. The hash index can only meet the field matching query, but not the range query.
  • ** Geographic location index: ** can be a good solution to O2O application scenarios, such as “find nearby food”, “find the station in a certain area”, etc.
  • ** Text index: ** can solve the need for quick text search, such as a collection of blog articles, according to the content of the blog to quickly search, you can build a text index for the blog content.

Nginx

Documents: www.aosabook.org/en/nginx.ht…

Article 8

Why is Nginx high performance

Nginx running process

  1. Multiple processes: one Master process and multiple Worker processes
  2. Master process: Manages Worker processes
  3. External interface: Receiving external operations (signals)
  4. Internal forwarding: According to the difference of external operations, workers are managed through signals
  5. Monitoring: Monitors the running status of worker processes, and automatically restarts worker processes when they are terminated abnormally
  6. Worker processes: All Worker processes are equal
  7. Actual processing: Network request, processed by Worker process;
  8. Number of Worker processes: this parameter is configured in nginx.conf. It is generally set to the number of cores to make full use of CPU resources and avoid excessive number of processes that compete for CPU resources and increase the loss of context switching.

HTTP connection establishment and request processing

  • When Nginx starts, the Master process loads the configuration file
  • The Master process initializes the listening socket
  • The Master process forks multiple Worker processes
  • Worker processes compete for new connections, and the winner establishes Socket connections and processes requests through a three-way handshake

TCP/UDP

TCP

Tcp three-way handshake

This is from Illustrated HTTP

  • Client – sends packets with the SYN flag – one handshake – server
    • First handshake: The Client does not confirm anything. The Server confirms that the peer sends the packet correctly and that the peer receives the packet correctly
  • Server – sends packets with SYN/ACK flags – Secondary handshake – client
    • Second handshake: The Client confirms that the sending and receiving operations are normal by itself and the receiving operations by the other party. Server confirmed; It is normal for the other party to send and oneself to receive
  • Client – sends packets with ACK flags – three-way handshake – Server
    • Third handshake: The Client confirms that the sending and receiving operations are normal by itself and the receiving operations by the other party. The Server confirms that the sending and receiving function is normal by itself and the receiving function by the other party

Therefore, you need to shake hands three times to confirm that the sending and receiving functions of both parties are normal.

Tcp waved four times

Four waves are required to disconnect a TCP connection:

  • Client – Sends a FIN to shut down client-to-server data transfer
  • Server – Upon receiving the FIN, it sends back an ACK acknowledging the received FIN plus 1. As with the SYN, a FIN takes a sequence number
  • Server – Closes the connection to the client and sends a FIN to the client
  • Client – Sends back an ACK packet for confirmation and sets the sequence number of the ACK packet to 1

Either party may issue a notice of connection release upon completion of data transmission, pending confirmation by the other party or entering the semi-closed state.

When the other party has no data to send, it sends a connection release notification. After the other party confirms, the TCP connection is completely closed.

UDP

UDP does not need to establish a connection before transmitting data, and the remote host does not need to acknowledge receiving UDP packets. Although UDP does not provide reliable delivery, UDP is the most efficient way to work in certain situations (generally used for instant messaging, such as QQ language, QQ video, live streaming, etc.)

Long connection/short connection9

TCP itself has no distinction between long and short connections, depending on how we use it.

  • ** Short connection: ** Each time communication, create Socket; Call to end a communicationsocket.close()This is a short connection in the general sense.
    • The advantage of short connections is that they are relatively easy to manage, and all existing connections are usable connections without additional control.
  • ** Long connection: ** After each communication, the connection will not be closed, so that connection reuse can be achieved.
    • The advantage of a long connection is that it saves the time of creating a connection and provides high performance.

  1. Rainbowmango. Gitbook. IO/go/chapter0…↩
  2. Rainbowmango. Gitbook. IO/go/chapter0…↩
  3. Rainbowmango. Gitbook. IO/go/chapter0…↩
  4. Go memory escape detailed analysis↩
  5. Go Memory escape analysis↩
  6. httprouter↩
  7. Redis single threading refers to the fact that the network request module uses a single thread, that is, one thread to handle all network requests, while other modules still use multiple threads ↩
  8. Why does MongoDB use B trees↩
  9. Segmentfault.com/a/119000002…↩
  10. www.cnkirito.moe/tcp-talk/↩