Hash table of summary of basic algorithms

What is a hash table?

A hash table, also known as a hash table, is a data structure that can be accessed directly based on key values. Simply put, the key value is mapped to a location in the array to access the record, which can speed up the response.

The method that evaluates the mapping is called a hash function, and the array of records is called a hash table. Is a typical space for time strategy.

In this way, when there is a data to query, first calculate the corresponding position through the hash function, and then search through the computed position, which is faster than directly querying all the data blocks. But it’s also important to note that the hash function here requires high efficiency.

Two. Hash function

2.1. Introduction

A hash function, also called a hash function, yields a fixed – length message digest for different outputs. The ideal hash function should have different structures for different inputs, and the hash results should be identical (the output values should be as uniform as possible) and have an avalanche effect (small changes in the input values cause large changes in the output values). 【 reference articles: hit – alibaba. Making. IO/interview/b…

To put it simply: the hash function is to calculate the location of the hash table according to the key.

2.2. Common methods

Hash function constructor: hash function constructor

Direct addressing: Take a keyword or a linear function value of the keyword as a hash address. An example is a hash table with age as the key
Random number method: select a random function, the keyword of the random function value as its hash value. This method is usually used to construct hash functions when keywords are of unequal length.
Fold: Divide the keyword into parts with the same number of digits, and then take the sum of these parts as the hash address.
Square middle method: first calculate the square of the keyword value, and then take the middle bits of the square value as the hash address.
Remainder method (most commonly used) : take the keyword is not greater than the length of the hash table number P mod, obtained as the hash address.
Digit analysis: If the number of bits of the keyword is larger than that of the address, the hash address is selected based on the analysis of the distribution of the bits of the keyword. This parameter is applicable only when all keywords are known. Select the part to be selected based on the actual application to avoid conflicts.

2.3. Data types

Different hash functions should be used for different data types.

2.3.1. Positive integer

The most common method for positive integers is: division remainder method.

Key = k \div M

Here, we select an array of prime numbers of size M, and compute the remainder of k divided by M for any positive integer k. This will effectively spread the bonds between 0 and M minus 1.

If M is not used as a prime, the hash values will not be evenly distributed and will not contain all information.

2.3.2. Floating point number

If the key is a real number between 0 and 1, we can multiply it by M and round it to get an index value between 0 and m-1. However, this method is flawed, that is, the high level plays a decisive role, status has no effect on the index value. The solution is to represent floating-point numbers as binary and then use division and remainder.

Getting hashCode for floating-point numbers in Java works like this. Let’s take a quick look at Float’s hashCode method:

public static int hashCode(float value) {
    return floatToIntBits(value);
}

@HotSpotIntrinsicCandidate
public static int floatToIntBits(float value) {
    if(! isNaN(value)) {return floatToRawIntBits(value);
    }
    return 0x7fc00000;
}

@HotSpotIntrinsicCandidate
public static native int floatToRawIntBits(float value);

Copy the code

The Jdk source code:

/* * Find the bit pattern corresponding to a given float, NOT collapsing NaNs */
JNIEXPORT jint JNICALL
Java_java_lang_Float_floatToRawIntBits(JNIEnv *env, jclass unused, jfloat v)
{
    union {
        int i;
        float f;
    } u;
    u.f = (float)v;
    return (jint)u.i;
}
Copy the code

The above conversion converts a floating point number to an integer, which is appropriate when using the division and remainder method.

2.3.3. String

Strings can also be treated with a divide-remainder method, where the string can be treated as a large integer.

Let’s look at the hashCode method of String in the Jdk:

public int hashCode(a) {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
    }
    return h;
}

// StringLatin1.hashCode
public static int hashCode(byte[] value) {
    int h = 0;
    for (byte v : value) {
        // what is the cause of int and 0xff?
        h = 31 * h + (v & 0xff);
    }
    return h;
} 

Copy the code

Java iterates through the byte[] array, executing each character’s code value: h = 31 * h + (v & 0xff).

As for why 31* H, check out the StackOverflow post.

Note that String’s hashCode function produces a negative value, and the array is out of bounds when modulo or mod is used as an array subscript.

The question arises: Why is hashCode negative? See extension!

2.3.4. Key combination

For a key that has multiple integer variables, you can mix them up just like String. If you are looking for a key of type Date, there are multiple integer fields: day, mouth, and year. So the hash function:

int hash = (((day * R + mouth ) % M) R +year) % M
Copy the code

This still yields a hash value in the range 0 to m-1

2.3.5. Java的hashCode

In Java, the Object class is the parent of all classes, which means that all data types in Java can inherit and return a hashCode() method of type init. And hashCode() and equals() are consistent for each data type.

a.equals(b)   // true
a.hashCode() == b.hashCode()  // true
Copy the code

To define a hash function using a custom data type, you need to override both hashCode() and equals(). (hashCode is the same, but equals is not necessarily the same, see the extension.)

Instead of a 32-bit integer, we can use hashCode() and dust-mod to produce an integer between 0 and m-1:

public int hash(Key x) {
  // and 0x7fffffff are used to mask the sign bit. Converts a 32-bit integer to a non-negative integer
  return (x.hashCode() & 0x7fffffff) % M;
}
Copy the code

Here’s a nice example of rewriting hashCode for a custom datatype (overwriting hashCode normally also requires overwriting equals) :

public class TestCode {

    private Integer age;

    private String name;

    private Boolean sex;

    @Override
    public boolean equals(Object o) {
        / /... omit
    }

    @Override
    public int hashCode(a) {
        returnObjects.hash(age, name, sex); }}Copy the code

Using the objects. hash utility class, we can rewrite the hashCode method.

public static int hash(Object... values) {
        // The actual method to calculate the hashCode value
    return Arrays.hashCode(values);
}

public static int hashCode(Object a[]) {
    if (a == null)
        return 0;

    int result = 1;

    for (Object element : a)
        // Get the hashCode and 31 of each value to do the operations
        result = 31 * result + (element == null ? 0 : element.hashCode());

    return result;
}
Copy the code

2.4. What is a good hash function?

To implement a corresponding hash function for a data type, there are three evaluation criteria for this hash function:

Consistency: Equivalent keys must yield the same hash value
High efficiency: easy to calculate
Uniformity: Uniformly hashes all keys

In Java, there are utility classes and hashCode functions.

Conflict management

In the previous section, we looked at how to convert a key into an array index (hash value) using a hash function. However, if you encounter multiple keys corresponding to the same hash value, you need to resolve the collision (collision). There are three common ways to solve the collision problem:

Chain address method
Open address method
Linear detection method

3.1. Chain address method

The basic idea of chained address method is to create a single linked list for each Hash value, into which records are inserted when conflicts occur. The following code is used to implement:

First define the related entity class:

// DataNode chain
type DataNode struct {
  key string    // Data key - used to generate hash values
  value string  / / data value
  next *DataNode // A pointer to the next data
}

// HashNode index data node
type HashNode struct {
  index int       / / index value
  head  *DataNode // Link header pointer
  end   *DataNode // List end pointer
  size  int       // The number of elements in the list
}

/ / HashTable hash tables
type HashTable struct {
  hash []HashNode / / a hash table
  size int        / / table
}

Copy the code

Add get and put methods:

func (h *HashNode) Get(key string) string {
  p := h.head
  forp ! =nil {
    if p.key == key {
      return p.value
    }
    p = p.next
  }
  return ""
}

func (h *HashNode) Put(key, value string) {
  node := NewDataNode(key, value)
  if h.head == nil {
    h.head = node
    h.end = node
    h.size++
  } else {
    p, flag := h.head, false
    forp.next ! =nil {
      if p.key == key {
        p.value = value
        flag = true
        break
      }
      p = p.next
    }
    if! flag { p.next = node h.end = node h.size++ } } }Copy the code

Increase the hash function

func (h *HashTable) hashCode(key string) int {
  bytes := []byte(key)
  res := 0
  for i := 0; i < len(bytes); i++ {

    res = 31 * res + int(bytes[i] & 0xff)}return res % h.size
}
Copy the code

Add Insert and Search methods for external use:

func (h *HashTable) Insert(key, val string) {
  idx := h.hashCode(key)
  idxNode := h.hash[idx]
  idxNode.Put(key, val)
  h.hash[idx] = idxNode
}

func (h *HashTable) Search(key string) string {
  idx := h.hashCode(key)
  idxNode := h.hash[idx]
  return idxNode.Get(key)
}
Copy the code

Such a simple hash table is implemented, but some functions such as loading factors and rehashing are not implemented. The most commonly used HashMap in Java is a good example of a hash table (explained in more detail in a later article).

3.2. Open address method

The open address method will store all the input elements in the hash table. In this way, hash table implementation does not need any linked list to achieve.

Basic principle: when inserting an element, first through the hash function to judge, if hash conflict occurs, based on the current address, according to the method of addressing, to find the next address, if conflict occurs, to find until a blank address.

There are several common readdressing methods:

Linear detection: This creates the first clustering problem
Secondary detection: This creates a secondary aggregation problem
In the hash method: the hash method is actually very simple, is to use the hash function to hash an input, the output is the same position hash again, until there is no conflict position, but each conflict has to hash again, the calculation time increases.

Here focus on the specific implementation of linear detection method, mainly focusing on insert, delete, search and expansion (shrink) operations.

3.2.1. Basic information

Define node data, implement ToString and constructor:

// Node data
type Node struct {
  data int
  status bool // Indicates whether to delete
}

// ToString outputs node information
func (l *Node) ToString(a) {
  fmt.Printf("[key:%d]\n", l.data)
}

// NewNode creates a node
func NewNode(key int) *Node {
  return &Node{
    data: key,
    status: true,}}Copy the code

Hash table definition:

// HashTable HashTable
type HashTable struct {
  node []*Node
  size int
  num int
}

NewHashTable creates a hash table
func NewHashTable(size int) *HashTable {
  return &HashTable{
    node: make([]*Node, size),
    size: size,
    num: 0,}}// Println Outputs the hash table
func (h *HashTable) Println(a) {
  fmt.Print("Hash table contents: [")
  for _, item := range h.node {
    ifitem ! =nil && item.status {
      fmt.Printf("%d\t", item.data)
    } else {
      fmt.Printf("%s\t"."*")
    }
  }
  fmt.Print("]\n")}Copy the code

Insert 3.2.2.

Insert process:

Insert element 3, calculate the insert subscript: 3%8=3, insert element 3 at the position with subscript 3;
Insert element 6, calculate insert subscript: 6%8=6, insert element 6 at subscript 6;
Insert element 14, calculate the insert subscript: 14%8=6, you can get the position of insert element 14 to 6, but find the position of insert element 6 already exists, then search the next position, find the position of insert element 7 is empty, then insert element 7;
Insert element 30, calculate the subscript: 30%8=6, it can be obtained that the element of 30 will be inserted to position 6, but found that there is an element at position 6, search for the next position to find that there is an element at position 7, consider starting from 0, found that there is no element at position 0, then insert 30 elements to position 0.

The insertion process is shown below:

Code implementation:

// Insert Inserts data
func (h *HashTable) Insert(data int) {
  // Determine whether capacity expansion is required
  if h.num >= h.size / 2 {
    h.resize(2 * h.size)
  }
  // Computes the hash value
  code := data % h.size
  forh.node[code] ! =nil && h.node[code].status {
    // If there is the same direct end
    if h.node[code].data == data {
      return
    }
    code = (code + 1) % h.size
  }
  // Insert data
  h.node[code] = NewNode(data)
  h.num++
}

Copy the code

Attention needs to be paid to capacity expansion. More on that later!

Test code:

func TestHashTable_Insert(t *testing.T) {

  hashTable := NewHashTable(8)

  hashTable.Insert(3)
  hashTable.Insert(6)
  hashTable.Insert(14)
  hashTable.Insert(30)

  hashTable.Println() // hash table contents: [30 * * 3 * * 6 14]
}
Copy the code

3.2.3. Search

The data search process is as follows:

Query data 3, calculate the hash value of 3 first, traverse HashTable from subscript 3, find the element at position 3 is the same as query data and not deleted state, indicating find and return;

Query data 30, calculate hash value 6, traverse HashTable from subscript 6, find that the element at position 6 is different from the queried element; The element 14 in the next location is also different from the element 30; At this point, you have reached the end of the array, and need to retrace the array header. If you find that the element at position 0 is the same as the query element and the state is not deleted, you find and return the element.

If the element at position 0 is not the same, then compare the element at the next position and find that the element at position 0 is nil or deleted, so there is no element to look for and return nil.

// Find Queries data
func (h *HashTable) Find(data int) *Node {
  code := data % h.size
  forh.node[code] ! =nil && h.node[code].status {
    if h.node[code].data == data {
      return h.node[code]
    }
    code = (code + 1) % h.size
  }
  return nil
}
Copy the code

Test code:

func TestHashTable_Find(t *testing.T) {
  hashTable := NewHashTable(8)

  hashTable.Insert(3)
  hashTable.Insert(6)
  hashTable.Insert(14)
  hashTable.Insert(30)

  hashTable.Println() // hash table contents: [30 * * 3 * * 6 14]

  hashTable.Find(30).ToString()  [key:30]
  hashTable.Find(3).ToString()   // node: [key:3]
  hashTable.Find(14).ToString()  // node: [key:14]
}
Copy the code

3.2.4. Delete

Delete elements from the hash table. If there are no hash conflicting elements, simply set the node to nil or set status to false to delete the element, as shown in the following figure:

Another method is to delete elements with hash conflicts. When deleting the elements again, it is necessary to reset the positions of the remaining elements to prevent them from being found after deletion, as shown in the following figure: the elements with subscript 6 are: 6,14, and 30. After 14 is deleted, if 30 is not reset, the element 30 cannot be found during the search.

The reset method is also simple, where the same hash value (data % 8) after the element is deleted is moved to fill the void created after the element is deleted.

Code implementation:

// Delete Deletes data
func (h *HashTable) Delete(data int) bool {
  code, delFlag := data % h.size, false
  // Find the data and delete it
  forh.node[code] ! =nil && h.node[code].status {
    if h.node[code].data == data {
      h.node[code].status = false
      h.num--
      delFlag = true
      break
    }
    code = (code + 1) % h.size
  }

  // If no data is found, the data is not found
  if! delFlag {return delFlag
  } else {
    // Data was found and deleted

    // Determine whether the index needs to be shrunk. If the index needs to be shrunk, it does not need to be reindexed
    if h.num > 0 && h.num == h.size / 8 {
      h.resize(h.size / 2)
      return delFlag
    }

    // If the capacity does not shrink, you need to adjust the position
    // Record the location of the deleted element and the location of the next element
    curr, next := code % h.size, (code + 1) % h.size
    // The element traversing the next position is not empty
    forh.node[next] ! =nil && h.node[next].status {
      // If the index of next is the same as the hash value of the deleted element
      if h.node[next].data % h.size == data % h.size {
        // Move the value and set the next position element to delete state as well
        h.node[curr].data = h.node[next].data
        h.node[curr].status = true
        h.node[next].status = false
        // Next points to the next hash position, curr points to the next position
        curr, next = next, (next + 1) % h.size
      } else {
        // If not, just move the pointer to next
        next = (next + 1) % h.size
      }
    }
    return delFlag
  }
}
Copy the code

Test code:

func TestHashTable_Delete(t *testing.T) {

  hashTable := NewHashTable(8)

  hashTable.Insert(3)
  hashTable.Insert(6)
  hashTable.Insert(14)
  hashTable.Insert(30)
  hashTable.Insert(11)
  hashTable.Insert(46)
  hashTable.Insert(1)
  hashTable.Println() // hash table contents: [46 1 * 3 * * 6 * * * * 11 * * 30 14]
  hashTable.Delete(14)
  hashTable.Println()  // hash table contents: [* 1 * 3 * * 6 * * * * 11 * * 30 46]
  hashTable.Find(46).ToString()  [key:46]
  hashTable.Find(1).ToString()  [key:1]

  hashTable.Delete(1)
  hashTable.Delete(3)
  hashTable.Delete(30)
  hashTable.Println() // Hash table contents: [* * * * * * 6 * * * * 11 * * 46 *]
  hashTable.Find(46).ToString()  [key:46]
  hashTable.Delete(11)
  hashTable.Println()  // Hash table contents: [* * * * * * 6 46]
  hashTable.Find(6).ToString()  // node: [key:6]

}
Copy the code

3.2.5. Capacity Expansion (Reduction)

Determine whether to expand elements before inserting them and determine whether to shrink elements after deleting them. To expand or shrink elements is to rehash the positions of elements.

The specific code is as follows:

// resize Capacity expansion/reduction
func (h *HashTable) resize(newSize int) {
  newNodes := make([]*Node, newSize)
  for _, item := range h.node {
    ifitem ! =nil && item.status {
      idx := item.data % newSize
      fornewNodes[idx] ! =nil && newNodes[idx].status {
        if newNodes[idx].data == item.data {
          break
        }
        idx = (idx + 1) % newSize
      }
      newNodes[idx] = item
    }
  }
  h.node = newNodes
  h.size = newSize
}
Copy the code

Testing:

func TestHashTable_Resize(t *testing.T) {
  hashTable := NewHashTable(8)

  hashTable.Insert(3)
  hashTable.Insert(6)
  hashTable.Insert(14)
  hashTable.Insert(30)

  hashTable.Println() // hash table contents: [30 * * 3 * * 6 14]

  hashTable.Insert(11)

  hashTable.Println() // Hash table contents: [* * * 3 * * 6 * * * * 11 * * 30 14]
  hashTable.Find(14).ToString() // node: [key:14]
  hashTable.Find(30).ToString() [key:30]
}
Copy the code

Extension one: Binary representation of floating point numbers

Java is run on the Jvm, which is written in C/C++. The CURRENT C/C++ standard is based on IEEE floating-point numbers for float and double computation. The presentation format is as follows:

Before we talk about floating point numbers, let’s talk about doing computer code: source code, inverse code, and complement.

You know that computers only recognize zeros and ones; The binary representation of a number in a computer is called the number of machines for that number. The number of machines is signed and the highest bit is signed (positive is 0, negative is 1).

For example: +5 to binary: 0000 0101; -5 converts to binary: 1000 0101

The first digit of the machine number is the sign bit, which is formally not equal to the real value, so the +5 corresponding to 0000 0101 is called the truth value of 0000 0101.

2. A binary number with a sign bit added. Truth + sign bit = source code)

+5   ==>   0000 0101
-5   ==>   1000 0101 
Copy the code

Inverse code: the positive inverse code is itself; The inverse of a negative number is based on its original code, the sign bit is unchanged, the rest bits are reversed

+5   ==>   0000 0101
-5   ==>   1111 1010
Copy the code

Complement: The complement of a positive number is itself; The complement of a negative number is based on its original code, the sign bits remain unchanged, and the remaining bits are reversed, plus 1 (equivalent to +1 on the basis of the inverse code).

+5   ==>   0000 0101
-5   ==>   1111 1011
Copy the code

So why inverse and complement? If we need to calculate the value of 6-3, we can think of 6-3 as 6 + (-3).

The decimal system	The original code	Radix-minus-one complement	complement
1	0000, 0001,	0000, 0001,	0000, 0001,
– 1	1000, 0001,	1111, 1110,	1111, 1111,
0	1000, 0010,	1111, 1111,	0000, 0000,

1111 is 1000 0000 when converted to the original code, which is equivalent to -0. Although +0 (0000 0000) is the same as -0 (1000 0000), the symbol bit is meaningless. Now, when you do the complement calculation, you get 0000, 0000, which is perfect.

Early computers CDC 6000, LINC, PDP-1, etc., all used inverse code. However, the following problems occur when using inverse code:

Two encodings of 0:0000 000 and 1000 0000;
The calculation rules of inverse code subtraction are complicated, and it is necessary to add the internal logic components of the computer to judge the overflow, which will affect the calculation efficiency.

After the addition and subtraction operation of the inverse code is slowly replaced by the complement. You can also see from the following why 1000 0000 means -128!

Just to remember some of the basics of computing, and then go back to the representation of floating-point numbers!

Decimal decimals to binary decimals by multiplying by 2, the sequential sorting method, as shown in the following figure to calculate 0.8125 binary:

Look at the binary representation of type 38414.4 double:

The data is now divided into integer parts (38414) and decimal parts (0.4), and these two parts are converted into binary numbers

38414[decimal] = 1001011000001110[binary]

0.4[decimal] = 0.01100110…… [binary], where the binary of 0.4 is never finished.

Here we just need to add the number of binary digits of the integer plus the number of binary digits of the decimal to 53 digits.

= 38414.4 1001011000001110.0110101010101010101010101010101010101, and then converted to binary scientific notation:

1.0010110000011100110101010101010101010101010101010101 x 2 ^ 15, you can determine the index of 15; Drop the 1 in the whole number and replace the trailing decimal with the mantissa;

Then we can determine the order code. For the order code of double, there are 11 signed bits, so we need 15 + 1023 (1023 is the maximum number of 10 bits in 11) = 1038 = 10000001110. Finally 38414.4 is the positive sign bit 0; The binary of 38414.4 is 01000000 11100010 11000001 11001101 01010101 01010101 01010101 01010101. See the following schematic diagram for details:

Extension 2:HotSpotIntrinsicCandidateannotations

Many in the JDK source will be @ HotSpotIntrinsicCandidate annotations, the annotations were introduced in JDK 9.

Be @ HotSpotIntrinsicCandidate labeling method, a set of effective implementation in HotSpot, the efficient implementation based on CPU instructions and runtime, HotSpot to maintain efficient implementation will replace the JDK source code to achieve, to gain higher efficiency.

How to find the corresponding native place in the JDK source code to view this article: gorden5566.com/post/1027.h…

Extension 3: String compression

Jdk9 changes the underlying data store of String from char[] to byte[], and strings support two encoding formats after Jdk9: LATIN1 (one byte) and UTF16 (two fourths of a byte).

static final boolean COMPACT_STRINGS;

static {
    COMPACT_STRINGS = true;
}

Copy the code

And COMPACT_STRINGS controls whether to enable the compact mode of String. This is enabled by default, and is disabled by the -xx: -compactstrings parameter.

public int hashCode(a) {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
    }
    return h;
}

// StringLatin1.hashCode
public static int hashCode(byte[] value) {
    int h = 0;
    for (byte v : value) {
        h = 31 * h + (v & 0xff);
    }
    return h;
} 

// StringLatin1.hashCode
public static int hashCode(byte[] value) {
    int h = 0;
    int length = value.length >> 1;
    for (int i = 0; i < length; i++) {
        h = 31 * h + getChar(value, i);
    }
    return h;
}

@HotSpotIntrinsicCandidate
// intrinsic performs no bounds checks
static char getChar(byte[] val, int index) {
    assert index >= 0 && index < length(val) : "Trusted caller missed bounds check";
    index <<= 1;
    return (char)(((val[index++] & 0xff) << HI_BYTE_SHIFT) |
                  ((val[index]   & 0xff) << LO_BYTE_SHIFT));
}

public static int length(byte[] value) {
    return value.length >> 1;
}
Copy the code

Extension 4: String hashCode negative value

String. HashCode: String. HashCode: String.

public static int hashCode(byte[] value) {
    int h = 0;
    for (byte v : value) {
        h = 31 * h + (v & 0xff);
    }
    return h;
}
Copy the code

This code is simple, but does it raise two questions:

If the string is long enough, does it cause a data overflow?
Why 0xFF?

Let’s explain them one by one:

Int overflow ();

public static void main(String[] args) {
    System.out.println("abcde".hashCode());   / / 92599395
    System.out.println("abcdef".hashCode());  / / - 1424385949
}
Copy the code

Why is this value negative -1424385949? This problem is covered in the Java Virtual Machine specification: narrowing conversions of data types.

Is roughly like this: when the Java settlement result exceeds the maximum range of the basic data type int, will be the default type, intermediate results as long storage, return the target data type is int cannot accommodate the results, so the original long converted to type int, beyond the part of all discarded, only to the results of 32 bits. As for why it’s negative, it has to do with the sign bit being 1 when the type gets narrower. Here’s a diagram to illustrate:

And then why do we want to go to 0xFF?

Take a look at the following code:

byte a = -127;
int b = a & 0xff; = 129

Copy the code

The complement of -127 is 1000 0001, but when byte is converted to int, all the 24 bits are filled with 1, so in line 2: b = a & 0xff; The Jvm will fill the missing 24 bits with 1, so now -127’s complement is 11111111 11111111 11111111 10000001; When the value is 0xFF, the complement becomes: 00000000 00000000 00000000 10000001, which will be the same as the original complement of -127. At the bottom of the computer, all you care about is the complement.

Extension 5: hashCode and equals

These two methods are often used to sum up one bar!

Equals equals equals equals equals equals equals equals equals

public boolean equals(Object obj) {
    return (this == obj);
}
Copy the code

Override equals() to determine whether the contents of two objects are equal:

public class TestCode {

    private Integer age;

    private String name;

    private Boolean sex;

    @Override
    public boolean equals(Object o) {
        // Check that the address is the same
        if (this == o) {
            return true;
        }
        // Class is the same
        if(o == null || getClass() ! = o.getClass()) {return false;
        }
        // The object attributes are the same
        TestCode testCode = (TestCode) o;
        returnObjects.equals(age, testCode.age) && Objects.equals(name, testCode.name) && Objects.equals(sex, testCode.sex); }}Copy the code

HashCode () gets the hashCode; Return an int integer; This hashCode is only useful when creating a HashMap, HashTable, and HashSet of a class, where hashCode() is needed to retrieve the hash of the object, thereby confirming the object’s position in the hash.

We can use the Objects utility class to implement the hashCode method:

public class TestCode {

    private Integer age;

    private String name;

    private Boolean sex;

    @Override
    public int hashCode() {
        returnObjects.hash(age, name, sex); }}Copy the code

Then go to this method to see how it works:

public static int hashCode(Object a[]) {
    if (a == null)
        return 0;

    int result = 1;

    for (Object element : a)
        result = 31 * result + (element == null ? 0 : element.hashCode());

    return result;
}
Copy the code

You can see that you get the hashCode for each attribute and compute the value multiplied by 31.

Notice that if two objects are equal then they must have the same hashCode; The hashCode of two objects is the same but the two objects are not necessarily the same (this is a hash conflict).

Extension 6: HashMap, HashTable, and HashSet

These are all collection classes based on hash tables, and they can be analyzed from the following points:

Thread-safe: HashTable is thread-safe, HashMap and HashSet are not;
Implementation: HashMap is based on the hash table of the zipper method. If the chain is too long, it will automatically turn into a red-black tree. The bottom layer of HashSet is implemented by HashMap.
Initial size: The initial size of HashTable is 11, and the initial size of HashMap is 16
Null values: A HashMap can use an empty value as a key or value. A HashTable does not allow null values (neither key nor value).
Capacity expansion mode: (oldCapacity << 1) + 1 is used for HashTable and oldCap << 1 is used for HashMap
Hash: A HashTable uses the object’s hashCode directly, while a HashTable takes changes that are also processed on the object’s hashCode;

Hash table of summary of basic algorithms

What is a hash table?

Two. Hash function

2.1. Introduction

2.2. Common methods

2.3. Data types

2.3.1. Positive integer

2.3.2. Floating point number

2.3.3. String

2.3.4. Key combination

2.3.5. Java的hashCode

2.4. What is a good hash function?

Conflict management

3.1. Chain address method

3.2. Open address method

3.2.1. Basic information

Insert 3.2.2.

3.2.3. Search

3.2.4. Delete

3.2.5. Capacity Expansion (Reduction)

Extension one: Binary representation of floating point numbers

Extension 2:HotSpotIntrinsicCandidateannotations

Extension 3: String compression

Extension 4: String hashCode negative value

Extension 5: hashCode and equals

Extension 6: HashMap, HashTable, and HashSet

Related Posts

ELK Centralized logging Platform ii – Deployment

Principle analysis and implementation of Huffman tree (C++)

Concurrency +JVM+ framework + distributed + database