This is the 15th day of my participation in the August More Text Challenge. For details, see:August is more challenging

For frequently used lookup tables, you want the ASL = 0 record to have a defined relationship between the position in the table and its keywords

HASH

define

According to the set hash function H(key) and the selected method of handling conflicts, a set of keywords is mapped to a finite address set (interval) with continuous address, and the “image” of keywords in the address set is used as the corresponding storage location in the table. The lookup table constructed in this way is called “hash table”.

Construction of the HASH function

Structure principle
- The function itself is easy to compute
- The calculated addresses are evenly distributed, that is, the probability of f(k) corresponding to different addresses is equal for any keyword K, in order to reduce the conflict as much as possible

Direct addressing

The hash function is a linear function of the keyword
- H(key) = key
- H(key) = a * key + b
This method is only suitable for: address set size = = keyword set size
Advantage: The hash address is the value of a linear function of the key
Disadvantages: To occupy continuous address space, low space efficiency

Numerical analysis

Assume that each keyword in the keyword set is composed of s-bit numbers (U1, U2,… , us), analyze the whole of the keyword set and extract a number of evenly distributed bits or their combination as addresses
This method is only suitable for preestimating the frequency of various numbers on each digit of the whole keyword

Middle square

The middle bits of the square value of the keyword are used as the storage address. The purpose of “square the keyword” is to “widen the difference,” while the middle of the square can be influenced by the entire keyword
This method is suitable for the phenomenon that each digit in the keyword has a high frequency of repeating some numbers

Folding method

Divide the keyword into parts and take the sum of them as the hash address. There are two methods of stacking: shift stacking and interbound stacking
This method is suitable for:

Keywords have a lot of digits

Except for the remainder method

Hash(key)=key mod p (p is an integer)
- P ≤ M (table length)
- P should be the largest prime number less than or equal to m

Why do you want to put a limit on P?

Given a set of keywords: 12, 39, 18, 24, 33, 21 if p=9, their corresponding hash function value will be:

Three, three, zero, six, six, three

It can be seen that if p contains the prime factor 3, all keywords containing the prime factor 3 are mapped to the address of “multiples of 3”, thus increasing the possibility of “conflict”

Random number method

H(key) = Random(key) (Random is pseudo-random function)
This method is used to construct hash functions for keywords of unequal length

consideration

Execution speed (that is, the time required to calculate the hash function)
The length of the keyword
Size of the hash table
Keyword distribution
To find the frequency

The way you construct a hash function depends on the set of keywords you’re building the table and the principle is to keep the possibility of collisions as small as possible

How to handle conflicts

1. Open addressing

The basic idea

If there is a conflict, look for the next empty hash address. As long as the hash table is large enough, the empty hash address will always be found and the data elements will be stored

Linear detection method

Hi=(Hash(key)+di) mod m (1≤ I < m) M – 1, and di = I

In case of conflict, find the next empty address to save

Advantage: as long as the hash table is not filled, it is guaranteed to find an empty address cell for the conflicting elements
Disadvantages: can store the synonym of the ith hash address in the I +1 address, so that the element that should be stored in the I +1 hash address becomes the synonym of the I +2 hash address… , resulting in the “aggregation” phenomenon, reducing the search efficiency

Secondary detection method

Di = 12, -12, 22, -22… Plus or minus k2

Pseudorandom detection

Hi=(Hash(key)+di) mod m (1≤ I < m)

Where: m is the hash table length and di is a random number

Open addressing method to create a hash table steps

Take the key of the data element and calculate its hash function value (address). If the storage space corresponding to the address is not occupied, the element is stored. Otherwise, perform step2 to resolve the conflict
Computes the next storage address for the keyword key based on the selected conflict handling method. If the next storage address is still in use, continue with step2 until a usable storage address is found

Open addressed hash table storage structure

/* ------------- Open address hash table storage structure ------------- */

int hashsize[] = {997. };typedef struct{
	ElemType* elem;
	int count;  // The number of current data elements
	int sizeindex;  // HashSize [sizeIndex] indicates the current capacity
} HashTable;

#define SUCCESS 1
#define UNSUCCESS 0
#define DUPLICATE -1

Status SearchHash(HashTable H, KeyType K, int &p, int &c){
	// Look for records with key K in the open addressing hash table H
	p = Hash(K);  // Get the hash address
	while(H.elem[p].key ! = NULLKEY && !EQ(K, H.elem[p].key))
		collisiion(p, ++c);  // Obtain the next probe address p
	if(EQ(K, H.elem[p].key)) return SUCCESS;  // Return the location p of the data element to be checked
	else return UNSUCCESS;  // The search was not successful
}
Copy the code

2. Then the HASH method

H2(key) is another hash function, whose value should be mutually prime with m

3. Chain address method

The basic idea

The records of the same hash address are linked into a single list. M hash addresses are set as m single lists, and then m single list header Pointers are stored in an array to form a dynamic structure

Advantages:

Non – synonyms do not conflict, no “aggregation” phenomenon
Dynamic application of node space on linked list is more suitable for uncertain table length

Hash table lookup

For a given value K, calculate the hash address I = H(K)

If r[I] = NULL, the search fails
If r[I].key = K, the search is successful; otherwise, “find the next address Hi” until r[Hi] = NULL or r[Hi].key = K

Case v01

Linear detection resolves conflicts

Case v02

The chain address method handles conflicts

Analysis of hash table lookups

From the lookup process, we know that the average lookup length of a hash table lookup is not actually equal to zero

Factors that determine the ASL of a hash table lookup

Hash function of choice
The chosen method of conflict management
Degree of hash table saturation, loading factor α= size of n/m values (n — number of records, m — length of table)

The larger α is, the more records there are in the table, indicating that the table is full, the greater the possibility of conflict, and the more times of comparison in search

Hash table technology has a good average performance, better than some traditional technology
The chain address method is superior to the open address method
The remainder method is superior to other types of functions for hashing

Examples of hash table applications

The compiler uses hash table to manage identifiers

How to construct a hash function
- Converts each character in the identifier to a non-negative integer
- Combine the resulting integers into one integer (you can add the first, middle, and last character values together, or you can add all character values together)
- To adjust the number of results within the range of 0~ m-1, the method of modulus can be used, Ki%M (M is a prime number)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Search – HASH

HASH

define

Construction of the HASH function

Direct addressing

Numerical analysis

Middle square

Folding method

Except for the remainder method

Random number method

How to handle conflicts

1. Open addressing

The basic idea

Linear detection method

Secondary detection method

Pseudorandom detection

Open addressing method to create a hash table steps

Open addressed hash table storage structure

2. Then the HASH method

3. Chain address method

The basic idea

Hash table lookup

Analysis of hash table lookups

Examples of hash table applications

Search – HASH

HASH

define

Construction of the HASH function

Direct addressing

Numerical analysis

Middle square

Folding method

Except for the remainder method

Random number method

How to handle conflicts

1. Open addressing

The basic idea

Linear detection method

Secondary detection method

Pseudorandom detection

Open addressing method to create a hash table steps

Open addressed hash table storage structure

2. Then the HASH method

3. Chain address method

The basic idea

Hash table lookup

Analysis of hash table lookups

Examples of hash table applications

Related Posts

Python Django builds personal blogging systems

Description of ByteBuf in Netty

Redis Interview Questions (2021 Latest edition)