Linear structure:

Arrays, queues, linked lists, stacks

Nonlinear structure:

Two-dimensional array, multidimensional array, generalized table, tree structure, graph structure

Data structure:

Arrays, linked lists, stacks, queues, hash tables, binary trees, heaps, hops, graphs, Trie books

An array of

An Array is a linear data structure. It uses a contiguous set of memory Spaces to store a set of data of the same type.

Down: They all follow a first-in, first-out rule.

Added note: the stack is first in last out

Topic regression array

The query

When we randomly access a value of the array :(O(1))

a[i]_address = base_address + i * data_type_size

a[i][j]_address = base_address + ( i * n + j) * type_size

Because of the continuity of memory, it can be quickly located in that area, queries are fast, and inserts and deletes are inefficient

insert

We’re going to insert a piece of data into the KTH position in the array. In order to make room for the KTH position, for the new data, we need to move the k to n elements one bit back.

(Average time complexity is O(n), best is O(1), worst is O(n))

Question: Is the optimal time to insert a value into an array at the specified location?

Put C at the end and X at the end

delete

When you actually delete an element in an array, the data needs to be moved for memory continuity.

Extended out to delete the train of thought:

Array A [10] contains eight elements: A, B, C, D, E, F, G, h. Now, we’re going to delete elements A, B, and C.

In order to avoid data d, E, F, G, and H being moved three times, we can first record the deleted data.

Each deletion does not actually move the data, only that the record data has been deleted. When there is no more space in the array to store the data, we trigger another actual delete, which greatly reduces the amount of data moved by the delete.

The list

Structure:

   

1) Linked lists are stored in the form of nodes, which is chain storage

2) Each node contains the data field, and the next field: points to the next node.

3) As shown in the figure, it is found that each node of the linked list may not be continuously stored. 4) The linked list can be divided into the linked list with a leading node and the linked list without a head node, which can be determined according to actual requirements

Singly linked lists

The header does not store actual data elements and is used to assist the location of data elements and facilitate insertion and deletion operations.

The successor pointer to the tail node points to the null address.

Insertion and deletion of singly linked lists:

Insert: first find the previous node (B) to be inserted, the tail node of X points to C, and the tail node of B points to X

Delete: find the last node (a) of the node to be deleted, and point directly from the last node of A to C. B is not pointed to and will be reclaimed by the JVM.

The advantage of a single linked list over an array is that it is faster to insert and delete elements. You don’t need to move a lot of elements, just change the pointer

So the time complexity of insert and delete header and tail is O(1), and the time complexity of query, insert and delete at the specified location is O(n).

Circular linked list

The tail pointer points to the head of the list. The data processed has the characteristic of ring structure.

Question: How do you tell if a linked list is a circular list?

If you set the speed and slowness of a circular list, the faster one will catch up with the slower one.

Two-way linked list

• Points to the previous node address (prev) and the next node address (next)

Support two-way traversal, more flexible. Support for bidirectional lookup for ordered lists.

Insert:

Put x in the middle of A and B, and point x’s prev to A’s next, and A’s next to x’s prev

(Just like holding hands, X’s left hand extends to A’s right hand, and A’s right hand also extends to X’s left hand to be considered holding hands successfully.)

Then x’s next points to B’s prev, and B’s prev points to X’s next.

Delete:

We want to delete the B element

The prev**** of the next node of element B points to the next** (element A) of the previous node of element B

Next of the last node of the B element points to prev** of the next node of the B element

And then we release the B element

Two cases of deleting a data from a linked list:

  1. Deleting a node with a value equal to XX requires traversal and comparison, and then pointer deletion. The time complexity is O(n).
  2. Delete the node to which the given pointer points:

When you delete a singly linked list, you need to know what the previous node of n is, and you need to do it all over again until p>next=n, okay

Then point the node found to the next node of n, O(n). When a double-linked list is deleted, it has a front node and a back node, O(1).

The stack

A stack is an “operation-constrained” linear table, that is, a bullet that only allows insertion and deletion of data at one end. (First in last out) equivalent to an array or linked list exposes too many operation interfaces, the operation is indeed flexible, but the use is less controllable, naturally more prone to error.

Application scenarios

1. Subroutine call: Before jumping to the subroutine, the address of the next instruction will be stored on the stack until the subroutine is executed, and then the address will be removed and returned to the original program

2. Handling recursive calls: similar to subroutine calls, except that in addition to storing the address of the next instruction, parameters, area variables and other data into the stack.

3. Expression conversion and evaluation

4. Binary tree traversal

5. Depth-first search method of graph

Jump table

For example, The Zset ordered set of Redis, such a structure of linked list and multi-level index, is a hop table, which can improve the query efficiency.

If we do not update the index when we are constantly inserting data into the hop table, we may have a situation where there is too much data between two index nodes. In extreme cases, hoppers can degenerate into single linked lists. So left left left left left left left down down down down down down

Dynamic update of index:

When we insert data into the hop table, we can choose to insert the data into the partial index layer at the same time. The hop table uses a random function to determine which level of index to insert this node into, for example, if the random function generates the value K, then this node is added to the k-level index from level 1 to level K.

In terms of probability, it can ensure the index size and data size balance of the hop table and prevent excessive performance degradation. Probabilistic equalization technique is used instead of mandatory equalization technique, so it is more concise and efficient than traditional balanced tree algorithm for node insertion and deletion.

Of course, there are other reasons why Redis uses jumpers for ordered collections, such as

• Skip tables are easier to code. They’re easier to understand and write than red-black trees, and simplicity means readability and less error prone.

• Skip tables are more flexible and can balance execution efficiency and memory consumption by changing index building strategies.

The queue

FIFO (First In, First Out)

1. The queue itself is an ordered list, an array or a linked list. If the structure of an array is used to store the data of the queue, the declaration of the queue array is shown below, where maxSize is the maximum capacity of the queue.

2. Since the output and input of the queue are processed from the front and rear ends respectively, two variables, front and rear, are required to record the subscripts of the front and rear ends of the queue respectively. Front changes with the data output, while rear changes with the data input, as shown in the figure:

Rear increases when the queue increases data; Front changes when a queue consumes data.

Hash table

Looking back at arrays and linked lists, arrays are easy to query, but hard to insert and delete; Lists are difficult to query and easy to insert and delete.

Hash table: query insert delete is easy;

Meaning:

Hash table application array supports random access by subscript, directly accessing data according to Key values. Records are accessed by mapping the key value to a location in the table to speed up lookups.

This mapping function is called the hash function

The array of records is called a hash table

Hash function:

  • ** Hash (key) : ** Where key represents the key value of the element and hash(key) represents the hash value computed by the hash function.

  • Stored procedure: The hash function maps the key value of an element to a subscript and stores the data at the corresponding subscript location in an array.

  • ** When we query an element by key, we use the same hash function to convert the key to an array index, fetching the data from the corresponding array index position.

Basic requirements for hash function design:

1. The hash function evaluates to a non-negative integer because the array subscript starts at 0;

If key1 = key2, hash(key1) == hash(key2);

3. Hash (key1) ≠ hash(key2) if key1 ≠ key2.

In real life, it is almost impossible to find a hash function that has a different hash value for different keys. Even the well-known hash algorithms such as MD5, SHA, and CRC cannot completely avoid such hash conflicts. Moreover, the limited storage space of arrays also increases the probability of hash collisions.

Common solutions to hash conflicts:

Open addressing and linked list

  1. Open addressing

When a hash conflict occurs, re-probe a free position and insert it

⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣⁣① Linear detection

Hash table inserts data:

The hash table has a size of 10, and six elements are inserted into the hash table before element X is inserted. X is hashed to position 7 after the Hash algorithm, but there is already data at that position, so there is a conflict.

So we’re going to go one by one, looking for free positions, and we’re going to go all the way to the end and we’re not going to find any free positions, so we’re going to start looking at the top of the table until we find free position 2, and we’re going to insert it into that position.

Hash table query data:

Lookup: Similar to the insert process. We use the hash function to find the hash value corresponding to the key value of the element to be searched, and then compare the element in the array whose subscript is the hash value with the element to be searched. If it’s equal, that means it’s the element we’re looking for; Otherwise, the search is carried out sequentially. If you traverse to a free position in the array and do not find it, the element you are looking for is not in the hash table.

Hash table delete data:

The delete operation is a little bit more special. We cannot simply set the element to be deleted to empty. It invalidates the original search algorithm. Data that would otherwise exist is assumed to be nonexistent. To address this issue, delete elements can be specifically marked as deleted. When a linear probe searches for a space marked deleted, it does not stop but continues the probe.

② Secondary detection

You don’t just use a hash function. Instead, use a set of hash functions hash1(key), hash2(key), hash3(key)…

The first hash function is used, and if the calculated storage space is already occupied, the second hash function is used, and so on, until a free storage space is found.

③ Double hash

Much like linear detection, where the step size of each detection is 1, the subscript sequence it detects is

Hash (key)+0, hash(key)+1, hash(key)+2……

The step of the second probe becomes the original “quadratic”, and the subscript sequence is hash(key)+0, hash(key)+1, hash(key)+2……

So that’s open addressing, so there’s a problem with open addressing:

As more data is inserted into the hash table, the likelihood of hash collisions increases, the number of free places decreases, and probes take longer and longer.

In extreme cases, we may need to probe the entire hash table, so the worst-case time complexity is O(n).

Similarly, when deleting and searching, it is possible to probe the entire hash table to find the data to look up or delete.

In order to maximize the efficiency of the hash table operation, we generally try to ensure that a certain percentage of free slots in the hash table are available. We use load factor ** to indicate how many vacancies there are.

** Load factor of the hash table = number of elements to fill in/length of the hash table **

The larger the load factor, the fewer free positions, the more conflicts, and the worse hash table performance. When the loading factor is too large, dynamic expansion is required, and the storage location of each data needs to be recalculated through the hash function to move data.

  1. The linked list method

• In a hash table, each bucket or slot has a linked list,

• All elements with the same hash value are placed in a linked list corresponding to the same slot.

• When inserting, we only need to calculate the corresponding hash slot through the hash function and insert it into the corresponding linked list, so the insertion time complexity is O(1).

• When an element is found or deleted, we also compute the corresponding slot using the hash function, and then traverse the list to find or delete it. The time complexity of these two operations is proportional to the length of the list k, which is O(k). For uniformly hashed hash functions, theoretically, k=n/m, where n represents the number of data in the hash and M represents the number of slots in the hash.

• We adapted linked lists from linked list method to other efficient dynamic data structures, such as jump tables, red black trees. Thus, even if hash conflicts occur, in extreme cases all the data is hashed into the same bucket, and the resulting hash table lookup time is only O(logn). This can effectively prevent hash collision attacks.

Comparison of methods for resolving hash conflicts

Open addressing vs linked list method

The data in an open-addressed hash table is stored in an array, making efficient use of the CPU cache to speed up queries and making serialization easier. When deleting data, you need to mark the deleted data. All data is stored in an array, which is costly to resolve conflicts. The upper limit of the load factor should not be too large. This method wastes more memory space than the linked list method.

Therefore, when the data volume is small and the loading factor is small, the open addressing method is suitable.

The hash collision handling method based on linked list is more suitable for storing large objects and large amounts of data. Moreover, compared with open addressing method, it is more flexible and supports more optimization strategies, such as replacing linked list with red-black tree.

Binary tree

• Height: the longest path from node to leaf node (number of edges)

• Depth: The number of edges it takes to get to this node

• Level: the node depth is +1

• Tree height: The height of the root node

Full binary tree:

In addition to leaf nodes, each node has left and right child nodes

Complete binary tree:

The leaf nodes are in the bottom two layers, the last layer of the leaf nodes are arranged to the left, and except for the last layer, the number of nodes in other layers should reach the maximum.

Binary tree traversal:

• Pre-order traversal, mid-order traversal, and post-order traversal

• Pre-order traversal means that, for any node in the tree, the node is printed first, then its left subtree, and finally its right subtree.

• Middle-order traversal means that, for any node in the tree, the left subtree is printed first, then itself, and finally the right subtree.

• Back-order traversal means that, for any node in the tree, the left subtree is printed first, then the right subtree, and finally the node itself.

Binary Search Tree

At any node in the tree, every node in the left subtree has a value less than this node, and every node in the right subtree has a value greater than this node. (Small on the left, large on the right)

The query

public class BinarySearchTree {
        private Node tree;
        public Node find(int data) {
            Node p = tree;
            while (p != null) {
                if (data < p.data) p = p.left;
                else if (data > p.data) p = p.right;
                else return p;
            }
            return null;
        }
        public static class Node {
            private int data;
            private Node left;
            private Node right;
        }
  }
Copy the code

insert

If, during insertion, a node is encountered with the same value as the new data, the new node is treated as greater than its value, that is, placed at the right node

public void insert(int data) { if (tree == null) { tree = new Node(data); return; } Node p = tree; while (p ! = null) { if (data > p.data) { if (p.right == null) { p.right = new Node(data); return; } p = p.right; } else { // data < p.data if (p.left == null) { p.left = new Node(data); return; } p = p.left; }}}Copy the code

delete

• Three cases of deletion: no child node, only one child node, and two child nodes

If there are children below the deleted node, the node with the smallest child is placed at that location

Public void delete(int data) {// p points to the Node to be deleted, initializes to the root Node Node p = tree; // pp is the parent of p. Node pp = null; while (p ! = null && p.data ! = data) { pp = p; if (data > p.data) p = p.right; else p = p.left; } if (p == null) return; // The node to be removed has two children if (p.eft! = null && p.right ! = null) {// Find the smallest Node in the right subtree. Node minPP = p; While (minP. Left! = null) { minPP = minP; minP = minP.left; } // replace minP data with p; P = minP; pp = minPP; } // Delete Node is leaf Node or has only one child Node child; // the child node of p if (p.left! = null) child = p.left; else if (p.right ! = null) child = p.right; else child = null; if (pp == null) tree = child; Else if (pp.left == p) pp.left = child; else pp.right = child; }Copy the code

Mark deletion:

The node to be deleted is marked “deleted,” but it is not actually removed from the tree. In this way, the deleted nodes need to be stored in the memory, which wastes memory space, but the deletion operation becomes much easier. Moreover, this approach does not make it more difficult to insert and find the implementation of the operation code.

Hash table VS binary tree

Hash table insert, delete, find operation time complexity can be constant O(1)

In a balanced binary search tree, the time complexity of insert, delete and search operations is O(logn).

First, the data in the hash table is stored unordered, so if you want to output ordered data, you need to sort it first. For binary search trees, we only need middle-order traversal to output ordered data sequences within O(n) time complexity.

Second, hashtable expansion takes a lot of time, and when hash conflicts occur, the performance is unstable. The performance of balanced binary search tree is very stable, and the time complexity is stable at O(logn).

Third, although the time complexity of operations such as the hash table lookup is constant, the constant is not necessarily less than logn because of hash conflicts, so the actual lookup may not be faster than O(logn). Plus the time consuming of the hash function, it is not necessarily more efficient than balancing binary search trees.

Fourth, the construction of hash tables is more complex than binary search trees, and there are many things to consider. Such as the design of hash functions, conflict resolution, capacity expansion, capacity reduction, etc. Balanced binary search tree only needs to consider the problem of balance, and the solution of this problem is relatively mature and fixed.

Finally, in order to avoid too many hash collisions, the hash table load factor should not be too large, especially if the hash table is based on the open addressing method, otherwise some storage space will be wasted.

Taken together, balanced binary search trees are superior to hash tables in some ways, so the existence of the two is not in conflict. In the actual development process, we need to choose which one to use according to specific requirements.

Binary tree equilibrium

AVL tree: The absolute value of the height difference between the left and right subtrees of each node is no more than 1, and the left and right subtrees are a balanced binary tree.

Why balance binary trees?

In order to solve the problem of time complexity degradation after frequent inserts, deletes and other dynamic updates.

Complete and full binary trees are both balanced binary trees

Rotate Left and rotate right

** So what do we use to decide whether left-handed or right-handed? ** Here comes an important factor

Balance factor

• The ** height (depth)** difference between the left subtree and the right subtree of a node is the Balance Factor (BF) of the node.

The equilibrium factor for all nodes in a balanced binary tree can only be -1, 0 or 1.

When the equilibrium factor BF of the root of the least unbalanced tree is greater than 1, it is dextral

When the equilibrium factor BF of the root of the least unbalanced tree is less than -1, it is left-handed

After node insertion, when BF of the minimum unbalanced subtree is opposite to BF of its subtree, it is necessary to rotate the node once to make the symbol the same, and then reverse rotation once to complete the balancing operation

Red and black tree

Red-black tree is a kind of balanced binary search tree which is not strict and approximately balanced.

Summary:

• The root node is black

• Every leaf node is black empty node (NIL), leaf nodes do not store data (set for code implementation)

• No adjacent nodes can be red at the same time, and red nodes are separated by black nodes

• Each node, all paths from that node to its reachable leaf node, contain the same number of black nodes

Red black versus AVL

As shown in the figure, the red node is removed from the red-black tree to obtain a temporary black quadtree. Since the path from any red node to the leaf node contains the same number of black nodes, a complete binary tree will be obtained after some nodes are placed on the leaf node.

The height is approximately log base 2n. So the height of the black tree is no more than log base 2n, and the height of the red node is no more than log base 2n

Conclusion: The height of red-black tree is only twice larger than that of AVL tree. However, the cost of maintaining balance is lower than that of AVL trees. Therefore, the search, insert and delete operation performance of red-black tree is relatively stable.

Red black tree implementation

• No adjacent nodes can be red at the same time, and red nodes are separated by black nodes

• Each node, all paths from that node to its reachable leaf node, contain the same number of black nodes

Application of red black tree

•1. In STL, which is widely used in C++, Map and Set are implemented with red-black trees;

•2. The well-known Linux process Scheduler Completely Fair Scheduler uses the red-black tree to manage the process control block, and the virtual memory area of the process is stored in a red-black tree. Each virtual address area corresponds to a node of the red-black tree, and the left pointer points to the adjacent address virtual storage area. The right pointer points to an adjacent high-address virtual address space;

•3. The implementation of IO multiplexing epoll adopts red-black tree to organize and manage SOCKFD to support fast add, delete, change and search;

•4. Nginx uses red-black tree to manage timer, because red-black tree is orderly, it can quickly get the smallest distance from the current timer;

•5. Realization of TreeMap in Java;

B tree

A B-tree is a balanced multi-fork search tree, that is, it can open up to m crosses (m>=2). We call it a B-tree of order M.

M-order B-trees meet the following conditions:

1. The root node has at least two children.

2. Each node in the tree contains a maximum of M children. In addition to root and leaf nodes, all nodes have at least [ceil(m / 2)) children; If the root is not a leaf, there are at least two children (except root that has no children)

3. If a node has n-1 keywords, then the node has n branches. The n-1 keywords are not equal and are in ascending order. The number of keywords contained in each non-root node j satisfies ceil(m / 2) -1 <= j <= m-1;

4. All leaf nodes appear in the same layer and do not contain any keyword information;

More order B tree

• Each internal node: n is the number of keywords in this node; Ki is the key word of this node and satisfies ki<ki+1. PI is the pointer to the child node of the node and satisfies that the keyword on the node referred to by PI is greater than ki and less than Ki +1, the keyword on the node referred to by P0 is less than k1, and the keyword on the node referred to by pn is greater than KN.

M=5 order B tree:

B + tree

The data is stored in a leaf node, and the internal nodes only keywords and child pointer, thus simplifying the branch factor of internal node B + tree traversal and more efficient, the B + tree just strings list all leaf nodes can traverse from beginning to end, so the internal node is not stored information, but the minimum value as index storage leaf node

1. There are n subtree nodes containing N keywords (also considered as n-1 keywords)

2. All leaf nodes contain all keywords and Pointers to records containing these keywords, and the leaf nodes themselves are connected in large order according to the size of the keywords

3. The non-leaf node can be regarded as an index part, which only contains the maximum (or minimum) keyword in its sub-tree (root node)

B trees are different from B+ trees

• Each node of a B tree stores data, and all nodes make up the tree. Only leaf nodes of a B+ tree store data (there are two head Pointers in B+ number: one points to the root node and the other to the leaf node with the smallest keyword). Leaf nodes contain all data of the tree. All leaf nodes are linked by a linked list to facilitate interval search and traversal, and all non-leaf nodes play an index role.

• The keywords in the middle node of the B tree are not the same as those in other nodes. The index entries in the B+ tree only contain the maximum keyword and pointer to the subtree, but do not contain the storage address of the record corresponding to the keyword.

•B+ tree search, whether the search is successful or not, is a path from root to leaf each time.

One’s own advantage

Advantages of B-tree:

Each node of the B-tree contains a key and a value, so frequently accessed elements may be closer to the root node and therefore more quickly accessed

Advantages of B+ trees:

• Because B+ trees have no data information on internal nodes, more keys can be stored in memory pages. Data is stored more closely, with better spatial locality.

• The leaf nodes of B+ tree are all phase linked, so the convenience of the whole tree only requires a linear traversal of the leaf nodes. A B tree requires recursive traversal of each level. Moreover, because the data are arranged in sequence and connected, it is convenient for interval search and search, and access to the data associated on the leaf node also has a better cache hit ratio.

Why use B+ trees for database indexing?

• Our data in MySQL is usually stored on disk. When we read data, we must access disk. There are two mechanical parts in disk, namely disk rotation and magnetic arm movement. Disk rotation is the number of revolutions per minute mentioned in our market, and disk movement is in the disk rotation to a specified position, after moving the magnetic arm began to read and write data. Then there is the process of locating the block in the disk, and locating is a relatively time-consuming piece of disk access, after all, mechanical movement takes much more time than electronic movement. Location is obviously a very time consuming process when large-scale data is stored on disk, but we can optimize it by b-tree to improve the efficiency of location when disk is read.

• Why can class B trees be optimized? We can according to the characteristics of the class B tree, more than an order of class B tree structure, and then in as much as possible on the node to store information, ensure the layer number of less as far as possible, so that later we can find information faster, disk I/O operations are less, and class B tree is balanced tree, the height of each node to the leaf node is the same, This also ensures that each query is stable.

, in general, B + tree is designed to disk or other storage device of a balance of multiple search trees (B, relative to the binary tree each node has several branches), compared with the red-black tree, in the case of the same node, the height of a B/B + tree far less than the height of the red-black tree (in the B/B + tree) in the performance analysis. The operation time on the B/B+ tree consists of the disk access time and the CPU computing time. The CPU speed is very fast. Therefore, the operation efficiency of the B tree depends on the number of disk access times.

The heap

• The heap is a complete binary tree (several books are full except at the last level, where nodes are left)

• The value of each node in the heap must be greater than or equal to (or less than or equal to) the value of each node in its child tree

Big top heap: Each node has a value greater than or equal to the value of each node in the subtree (1, 2)

Small top heap: Each node is less than or equal to the value of each node in the subtree (3, 4)

How do YOU implement a heap

Array implementation, array subscript I:

• Left child subscript I * 2

• Subscript I * 2 + 1 for the right child node

• Subscript I / 2 of the parent node

Advantages of array implementation: space saving, no need to store Pointers to the left and right child nodes, by subscript lookup.

(That’s why we have a special binary tree called a complete binary tree.)

Heap of insert

1. Insert at the end of the array

2. Heapify: Reposition data to meet the characteristics of the heap

Heapified: compare up or down the path of the nodes, then swap.

Adjust from bottom up: Swap each time you compare with the parent node

Adjust from top down: Swap each time you compare with the child nodes

For example, code implementation:

The deletion of the heap

Code implementation

Time complexity of heapification

A complete binary tree with n nodes, no higher than log2n. Heaps in inserts and deletes are compared and swapped along the path of the nodes, so the time complexity is proportional to the height of the tree, which is O(logn).

Heap sort

• Time complexity: O(n logn)

• In place sorting algorithm (no or very little extra auxiliary space required)

Two steps of heap sort:

1, build the heap

2, sorting,

Heap sort – Build heap

There are two ways to build a heap:

•1. Create a heap array, insert the original array into the heap and heap it from bottom to top in order;

• on the original array, heap from top to bottom, starting with the last non-leaf element.

The application of the heap

• Queue: First in first out

• Priority queue: The queue with the highest priority is first out

• Priority queue for heap implementation: top of heap element comes out first

• Uses: Huffman coding, graph shortest path, minimum spanning tree algorithm

• Task scheduler in operating system

• Prioritize high-level users (messages, jobs…) The request of

•RabbitMQ priority queue

• Java PriorityQueue

Priority_queue, c + +

•Top K charts

• Maintain a small top heap of size K, iterate over the group of elements, compare with the top element, delete the top element if it is larger than the top element, and insert the currently traversed element. • Traversal complexity O(n), heapization complexity O(logk), total time complexity O(nlogk) Dynamic data ranking: • Maintain a small top heap of size K. When adding an element, compare it with the top element. If the element is larger than the top element, delete the top element and insert the added element. • Time complexity: O(logk)Copy the code

Heap applications – find the median

• Sort arrays from smallest to largest. If the number of arrays is odd, the median is n/2

• If n is even, the median is n/2 and n/2+1, take n/2

• Static data: Sort first and take the NTH / 2nd data directly

• Dynamic data: Sorting data per insert is inefficient.

Dynamic data is implemented with heap:

After the current array is sorted, maintain two heaps:

The big top heap stores the first half of the array (the first N /2 elements, or the first N /2+1 elements if n is odd)

The small top heap stores the bottom half of the array (post N /2)

The top element of the big top heap is the median.

Public number: There are fish in north Beijing