How to rewrite HashCode () correctly?

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Prepare for spring recruitment or summer internship in 2022, wish you every day a hundred million points of progress! Day4
How to properly override the EQUALS method of the JDK
For Redis Getting started to master, Concurrent Programming, please refer to my previous blog
Believe in yourself, the more live more strong, alive should be open, meet water bridge! Life, you give me pressure, I return you miracle!

1, the introduction of

I don’t know if you’ve ever overwritten a HashCode method in development, or had a question about it in an interview. For example, some basic Java jobs might ask: Have you ever used an object as a HashMap key?

As you can see from the following screenshot of the HashMap put method, when adding elements to the container to calculate the hash value, the hashcode method of the key object is called.

How do I properly override the HashCode method?

This is actually a very common and seemingly very simple problem, but there are really few programmers who can write well. (Often the more attractive more dangerous, more simple more complex!!)

Look down and see if you belong to that well-written programmer!

2, the body

2.1 When to rewrite

Before delving into how to override a HashCode method, it’s important to know when to override hashCode.

All classes that need to override equals need to override hashCode!

So at this point you’re asking, when do I need to override equals?

About this problem small BA already had said in the last article, the brothers that need can go to my column “Java small knowledge 100 cases” series have a look, by the way point wave subscription, pay attention to small BA study Java not to get lost!

2.2 How can I Rewrite it?

The hashCode method is a native method provided by Java’s java.lang.Object that is implemented in the JVM and returns the memory address of the current Object.

Public native int hashCode();Copy the code

So when our class doesn’t override the HashCode method, and the rest of the class’s superclasses don’t override either; So when we call the HashCode method, it will always return the memory address of the object. That’s probably not what you want, so how do we rewrite it?

Train of thought

First of all, we need to know that we calculate the hash by the field of the object. There are so many fields in the object, such as array, reference type, and primitive data type, that we cannot select the hash value of any field as the return value of the object’s HashCode method. So we consider adding up the hash values of the fields to return!

Basic data types, which you can refer to in the hashCode method of the corresponding wrapper type
The reference type calls hashcode() directly

Array types iterate over the array, calling hashCode () in turn

General implementation

This is the hash method provided by java.util.Objects to evaluate hashcode. While this is not a silver bullet for calculating Hashcode, we can use this implementation as a reference, and most of the hashCode classes in the Java JDK source code have similar implementations!

public static int hash(Object... values) {
    return Arrays.hashCode(values);
}
Copy the code

public static int hashCode(Object a[]) {
    if (a == null)
        return 0;

    int result = 1;

    for (Object element : a)
        result = 31 * result + (element == null ? 0 : element.hashCode());

    return result;
}
Copy the code

This method can be roughly divided into two steps:

If a==null, return hashCode 0
If a! = null, then each field is traversed. If the field is not null, then the hashcode method of the field is called and summed

There is a very conspicuous number 31, and each time the loop will result in the current *31. Why is that?

Result *31 is computed every time to prevent hash collisions. Because if a product factor is not set, the result of result calculation is relatively small, and it is very easy to have the same hash value after the process of accumulation, which is not what we want to see!

So why 31? Why can’t 31 be the JDK computing team’s chosen one, but 2? It can’t be 1001, right?

In fact, there are reasons for using 31 as the product factor, and the reasons are as follows:

31 is a number that is not too small to cause the results of a HashCode calculation to conflict; Since the return value is an int integer, it is not too large to cause hashCode to overflow.
31 is an odd number, and when you multiply an odd number, you don’t lose the low order easily; Because multiplying by two is like moving to the left unsigned one bit, and that’s going to fill in zeros in the lower order, and that’s going to make the values that HashCode evaluates very conflicted.

31 is very friendly to virtual machines. For virtual machines, 31 = 2^ 5-1, it can be optimized for this number and converted into bits, so the performance is better when multiplying

The minor eighties are tested here by product factor 2 and product factor 31 respectively:

package com.liziba.part2; import org.apache.commons.lang3.RandomStringUtils; import java.util.ArrayList; import java.util.Comparator; import java.util.List; import java.util.Objects; /** * <p> ** @author: Liziba * @date: 2021/10/24 11:54 */ public class HashCodeMethodDemo {/** * calculates hashcode ** @param value Calculates hashcode string * @param capacity Public static int hashCode(String value, int capacity) {int hash = 0; if (Objects.nonNull(value) && value.length() > 0) { char[] chars = value.toCharArray(); for (int i = 0; i < chars.length; i++) { hash = capacity * hash + chars[i]; } } return hash; } /** * conflictCompare(int capacity, int capacity) ** @param hashValues */ public static void conflictCompare(int capacity, int capacity) List<Integer> hashValues) { Comparator<Integer> comparator = (x, y) -> (x > y) ? 1 : ((x < y) ? 1:0); Integer max = hashValues.stream().max(comparator).get(); Integer min = hashValues.stream().min(comparator).get(); long conflictNum = hashValues.size() - hashValues.stream().distinct().count(); Double conflictRate = conflictNum * 1.0 / hashvalues.size (); Format (" Capacity =%d Number of collisions =%d Collision rate: %.4f%% Maximum: %d Minimum hashCode: %d", capacity, conflictNum, conflictRate * 100, max, min)); } public static void main(String[] args) { int num = 100000; int capacity2 = 2; int capacity31 = 31; List<Integer> hashValues2 = new ArrayList<>(num); List<Integer> hashValues31 = new ArrayList<>(num); for (int i = 0; i < num; I++) {/ / generates random number org.apache.com mons. Lang3. RandomStringUtils String value = RandomStringUtils. RandomAlphabetic (15); hashValues2.add(hashCode(value, capacity2)); hashValues31.add(hashCode(value, capacity31)); } conflictCompare(capacity2, hashValues2); conflictCompare(capacity31, hashValues31); }}Copy the code

A total of 100,000 random 15-bit strings are tested

When the multiplier factor is 2, the conflict rate is close to 4%
When the multiplier factor is 31, the conflict rate is only 0.0010%

Does that mean that every time YOU rewrite a hashCode method, you have to multiply by 31?

This is certainly not the case! The product factor 31 is just a solution to reduce hash collisions, so you don’t need to use the product factor when you don’t need it.

How to rewrite HashCode () correctly?

1, the introduction of

2, the body

2.1 When to rewrite

2.2 How can I Rewrite it?

Train of thought

General implementation

Related Posts

With Java18 on the way, are you still using Java8?

Git Clone download speed on CentOS server

10 ways to tune MySQL performance