Discuss hash table and construct hash table with C language

Hash table: A data structure accessed directly by key-value without comparison of key values, thus saving a lot of processing time.

Hash functions: The most important consideration in choosing – avoid collisions as much as possible

The principle of constructing hash function is: ① the function itself is easy to calculate; ② The calculated addresses are evenly distributed, that is, the probability of corresponding different addresses for any keyword k, f(k) is equal, so as to reduce conflicts as much as possible.

1. Direct addressing method:

If we now want to enumerate the population between 0 and 100 years old, then we can address the keyword age directly with the number of years. So f of key is equal to key.

F (0) = 0, f(1) = 1… F of 20 is equal to 20. This is based on the direct address that we set ourselves. We don’t care about the number of people, we care about how to find the address by keyword. If we now want to count the number of people born in the 1980s, we can address the keyword year of birth by subtracting 1980 from the year. So f of key is equal to key minus 1980.

If this year is 2000, then the person born in 1980 is 20 years old, and f(2000) = 2000-1980, you can find address 20, where the data “5 million people” is stored. In other words, we can take a linear function value of the keyword as the hash address, that is:

F (key) = a × key + b

The advantage of such a hash function is that it is simple, uniform, and does not cause conflicts, but the problem is that it requires knowing the distribution of keywords in advance, which is suitable for small and continuous lookup tables. Due to this limitation, direct addressing is not commonly used in practical applications although it is simple.

2. Numerical analysis:

Analysis of a set of data, such as a set of the employee’s date of birth, then we found the same date of birth of the first few digits, in this case, the chances of conflict will be very big, but we found that date back several month and date number difference is very big, if use the back of the Numbers to form a hash address, will significantly reduce their risk of conflict. Therefore, numerical analysis method is to find out the rule of numbers and use these data to construct the hash address with low probability of conflict as far as possible.

3. Folding method

Divide the keyword into parts with the same number of digits, the last number of digits can be different, and then take the sum of these parts (minus the carry) as the hash address.

4. Division and remainder method:

(Common: take the remainder of the keyword divided by a number p that is not larger than m of the hash table as the hash address. H(key) = key MOD p, p<=m. Not only can the key directly modular, but also after folding, square take medium operations modular. The choice of P is very important, generally take prime number or m, if the choice of P is not good, it is easy to produce synonyms.

5. Square the middle

When it is impossible to determine which bits of the keyword are evenly distributed, the square value of the keyword can be calculated first, and then the middle bits of the square value can be taken as the hash address. This is because the middle bits after the square are related to each bit in the keyword, so different keywords will produce different hash addresses with a high probability.

6. Pseudo-random number method:

Use a pseudo-random function to do the hash function, that is, H (key)=random(key).

Conflict resolution methods

1. Open addressing method:

When an address conflict occurs, probe the other storage cells in the hash table in some way until an empty location is found. This process can be described as follows: H I (key) = (H (key)+ d I) mod m (I = 1,2… , k (k ≤ m – 1)) H (key) is the direct hash address of the key word, m is the length of the hash table, and di is the address increment during each probe. In this method, the direct hash address H (key) of the element is calculated first. If the storage unit is occupied by another element, the storage unit with the address H (key) + d 2 is searched. The process is repeated until a storage unit is empty. Stores the data element with key into the cell. (1) d I = 1, 2, 3… Linear detection and hashing; (2) d I = 1^2, -1 ^2, 2^2, -2 ^2, k^2, -k^2…… Secondary detection and hash; (3) d I = pseudo random sequence pseudo random rehash;

2. Chain address method:

If the hash table space is 0 ~ m-1, set a one-dimensional array ST[m] composed of M pointer components, and insert all data elements with hash address I into the linked list with head pointer ST[I]. This method is similar to the basic idea of adjacency list, and it is suitable for serious conflicts.

3. Public Overflow Zone Method:

Hash table is divided into two parts: basic table and overflow table. All elements that conflict with basic table are filled into overflow table

C language implementation

Define macros and structures

#define HashMaxSize 1000 #define HashMaxSize 1000
#define LoadFactor 0.8 // LoadFactortypedef int KeyType; typedef int ValueType; Typedef size_t(*HashFunc)(KeyType key) Typedef enum Stat{// represents the state of each element Empty, // Empty, and currently has no value Valid, // The current value is Valid. Invalid // Is not Empty but Invalid, indicating that the current node is deleted. Typedef struct HashElem {/* data */ KeyType key; ValueType value; Statstat; }HashElem; Typedef struct HashTable {HashElem data[HashMaxSize]; size_t size; HashFunc HashFunc; }HashTable;Copy the code

Function declaration

void HashTableInit(HashTable *ht,HashFunc hashfunc); Int HashTableInsert(HashTable *ht,KeyType key,ValueType value); int HashTableFind(HashTable *ht,KeyType key,ValueType *value,size_t *cur); Void HashRemove(HashTable *ht,KeyType key); void HashRemove(HashTable *ht,KeyType key); Int HashEmpty(HashTable *ht); Size_t HashSize(HashTable *ht); Void HashTableDestroy(HashTable *ht); // Destroy the hash tableCopy the code

function

size_t HashFuncDefault(KeyType key){
    returnkey%HashMaxSize; } void HashTableInit(HashTable *ht){if(HT == NULL) // Invalid inputreturn; ht->size = 0; ht->hashfunc = HashFuncDefault; // Error assignmentfor(size_t i =0; i<HashMaxSize; i++){ ht->data[i].key = 0; ht->dota[i].stat = Empty; ht->data[i].value = 0; } } int HashTableInsert(HashTable *ht,KeyType key,ValueType value){if (ht == NULL)
        return 0;
    if(ht->size >= HashMaxSize*LoadFactor) // When the hash table size exceeds the loadreturn0; size_t cur = ht->hashfunc(key); // Get the index of key in the hash tablewhile(1){// Determine whether the current subscript is occupiedif(ht->data[cur]. Key ==key) // Ensure that no duplicate numbers are stored in the hash tablereturn 0;
        if(ht->data[cur].stat ! = Valid){ ht->data[cur].key =key; ht->data[cur].value = value; ht->data[cur].stat = Valid; ht->size++;return1; } cur++; }} int HashTableFind(HashTable *ht,KeyType key,ValueType *value)if(ht == NULL)
        return0; size_t offset=ht->hashfunc(key); // Find the subscript of the key using the hash functionif(ht->data[offset]. Key == key && HT ->data[offset]. Stat == Valid){// *value = is returned only when the current subscript value is key and the current status must be Valid ht->data[offset].value;return 1;
    } else{// The value is not key and is searched according to the conflict resolution method (the current resolution method is backward) until it is foundstatIs equal to the emptywhile(ht->data[offset].stat ! =Empty){if(ht->data[offset].key ! =key){ offset++;if(offset >= HashMaxSize// determine whether the subscript exceeds the maximum value) offset = 0; }else{
                if(ht->data[offset].stat == Valid){
                    *value =ht->data[offset].value;
                    return 1;
                } elseoffset++; }}return0; }} int HashTableFindCur(HashTable *ht,KeyType key,size_t *cur){// Delete nodeif(ht == NULL)
        return 0;
    for(size_t i = 0; i < HashMaxSize; i++){if(ht->data[i].key == key && ht->data[i].stat == Valid){ *cur = i; / /?return1; }}return 0;
}

void HashRemove(HashTable *ht,KeyType key){
    if(HT == NULL) // Invalid inputreturn; ValueType value = 0; size_t cur =0; int ret = HashTableFindCur(ht,key,&cur); // Use the find function to find if the key exists in the hash tableif(ret == 0)
        return;
    else{
        ht->data[cur].stat = Invalid;
        ht->size--;
    }
}
int HashEmpty(HashTable *ht){
    if(ht == NULL)
        return 0;
    else
        returnht->size >0? 1:0} size_t HashSize(HashTable *ht)if(ht == NULL) {
        ht->data->stat = Empty;
        return ht->size;

    } else
    {
        return ht->size;
    }

}
void HashPrint(HashTable *ht, const char *msg){
    if(ht ==NULL || ht->size = 0)
        return;
    printf("%s/n",msg);

    for(size_t i =0; i< HashMaxSize; i++){if(ht->data[i],stat! = Empty)printf("[%d] key=%d value=%d stat =%d\n",i,ht->data[i].key,ht->data[i].value,ht->data[i].stat); }}Copy the code

Refer to the blog: blog.csdn.net/wangpengqi/… Blog.csdn.net/mark555/art… www.cnblogs.com/youngerchin… Blog.csdn.net/weixin_4033… Blog.csdn.net/u011109881/…