Come on, guys, let’s meet each other.

I am a worldly wanderer, a Java programming ape who has been wandering for many years

preface

In the previous two articles we talked about the List collection in the collection container, which contains two subclasses:

  • ArrayList
  • LinkedList

If you haven’t seen it yet, go back and check it out

I’ve left a few thoughts on my LinkedList. I don’t know if I’ve thought of any, but we’ll explore them in the comments

In this article we are going to talk about maps, and we are going to recall that in the introduction, we said that one big difference between a Map and a List is:

  • A List stores data as a single element, whereas a Map stores data in k-V format

Why do we have K,V to store data?

In traditional systems, the storage of about 10W or 100W of our data is enough, but in some special applications or big data platforms, tens of millions or more of data are involved.

In the advertising system developed by my last company, I can generate at least 2,000W data every day

At this time, if we want to find a certain data from it is very troublesome, involving performance problems, and through K,V storage, we are equivalent to adding an index to a value, through this index we can quickly locate the data, improve the performance of the system.

As for K and V forms of storage, we also use them in our work, such as:

  • Session
  • Redis
  • HBase
  • .

Ok, so with that in mind, let’s move on.

As mentioned above, all the parent classes of a Collection are collections, but Map is a separate set of interfaces, which cannot be mixed together. Let’s look at the implementation subclasses of Map:

  • HashMap
  • LinkedHashMap
  • TreeMap

I’ve also given you a mind map for understanding ArrayList

Let’s just take it one by one

Play the most important point: the interview rate the thief up the interview in September for a month, 80% of companies have asked (then don’t know -_ – | |).

Before we talk about today’s topic: HashMap, let’s take a quick look at what a hash table is

HashMap is a long, poorly written HashMap, so be patient

Hash table

What is a hash table?

A hash table is a data structure that is accessed directly based on a Key value. That is, it accesses records by mapping key values to a location in the table to speed up lookups

Source: Baidu Baike

PS: Concepts are not human words, do not remember him, directly look at the structure

Hash tables are divided into various types. Here is the structure implemented in the underlying HashMap:

Above is an array, which is a contiguous memory space in memory. The Key Value is taken to hashCode and then the length of the array is modulo to obtain the corresponding subscript position. Then the Value is placed at the corresponding subscript position. If an element exists at the corresponding subscript position, it is appended as a linked list

This method is called division hashing in the hash function

The value is also the same: after the key is hashed to the corresponding index, if there is only one data in the corresponding position, it is directly fetched; otherwise, it is compared in the linked list and then fetched

Just a quick introduction to hash tables, let’s move on to HashMaps

HashMap

Background: JDK1.8

As usual, let’s take a look at a class distribution of a HashMap

Operation method

Introduction to Attribute Values

We all construct a HashMap like this:

Map<String, String> hashMap = new HashMap<>();
Map<String, String> hashMap = new HashMap<>(20);
Copy the code

So this is a nonparametric construct, so what do we do?

/**MUST be a power of two.*/
// Default initialization length
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
// Load factor, which is responsible for determining how far data is stored when to expand capacity
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

public HashMap(a) {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
Copy the code

In our previous experience, storing data requires clearing space in memory, but this is not done in the constructors of the HashMap, including other constructors that take arguments:

public HashMap(int initialCapacity) {
	this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

// With the exception of transfer parameters,
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}
Copy the code

The lesson here is that if you create space in memory when you define a constructor, if you don’t store data, that memory is wasted

If you do not specify an array length, the default value is DEFAULT_INITIAL_CAPACITY, which is 16.

MUST be a power of two WHY?

In Hashtable, we follow the specification of division and hash: primes that are not close to an integer power of 2, but why not use this specification in HashMap instead of using an NTH power of 2? We’ll talk more about that later

Again, we have another point to focus on:

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

By calculating, we get the nearest power of 2 to the NTH power of the passed parameter, so even if the passed parameter does not meet the requirements, we will adjust it in the code

There is also a loadFactor attribute in the HashMap that determines when the HashMap container space should be expanded. The default is 0.75

For example, if the initial length is 16 and the load factor is 0.75, the capacity of the container is larger than (16 x 0.75 = 12)

Now let’s look at two more properties

static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
Copy the code

These two attributes were added after JDK1.8. In simple terms, when the list length is greater than 8, the list will be stored as a red-black tree. When the red-black tree length is less than 6, the list will be stored as a linked list

  • In other words, in JDK1.8, the underlying structure of a HashMap takes the form of a hash table plus a red-black tree

This is the following structure:

Why is it an 8 turn red black tree?

A comment above the attribute value DEFAULT_INITIAL_CAPACITY gives the analysis:

/* * the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) with a * parameter of about 0.5 on average for the default resizing * Threshold of 0.75, although with a large variance because of * resizing Granularity. Ignoring variance, The expected * occurrences of list size k are (exp(-0.5) * POw (0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * More: less than 1 in ten million * */
Copy the code

In other words, by calculation, k=8 is close to 0, so it is defined as 8 to improve the efficiency of retrieval. This involves a poisson distribution: this is a discrete probability distribution commonly seen in statistics and probability

How to understand the Poisson distribution popularly

Method statement

Now that we know the basic property values, let’s look at the specific methods, again, but before that let’s look at two classes:

  • Node
static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
	// ...
}
Copy the code

This class is no stranger to you, as you’ve seen before in LinkedList, but it’s a one-way LinkedList

  • TreeNode
static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }
    / /...
}
Copy the code

This is a concrete class of red-black trees

So let’s move on,

put

We call it like this:

hashMap.put("key1"."value1");

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

Hash (Key); hash (Key); hash (Key);

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);   
}
Copy the code

In order to make the calculated index position more scattered, so first (h >>> 16), again, through the ^ operation to hash the high and low bits can participate in the operation, so as to reduce the probability of hash collision

The disturbance function

Let’s look at the implementation:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;							/ / comment 1
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);				/ / comment 2
    else {
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))		/ / comment 4
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);/ / comment 5.2
        else {
            for (int binCount = 0; ; ++binCount) {				// Insert the list
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) 				/ / comment 5
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { 									/ / comment 4.2
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)									/ / comment 3
        resize();	
    afterNodeInsertion(evict);
    return null;
}
Copy the code
Open up space and expand

As mentioned earlier, nothing is done in the constructor, only space is created when the element is actually added. Note 1 above is the process of opening up space, and note 3 above, this method is also the process of expanding our capacity, calling the same method resize() :

Step by step, take a look at the putVal() method, and then resize().

The first time we put(), we just create an array of nodes, nothing else is done, and then we move on

Calculates the index subscript

The first time an element is put(), it is sure to enter comment 2, and the subscript position of the current element is evaluated and assigned by &

At the same time, if you have looked at the JDK1.7 source code before, you will notice that there is a way in 1.7:

static int indexFor(int h, int length) {
    return h & (length - 1);
}
Copy the code

In JDK1.8:

i = (n - 1) & hash
Copy the code

Here’s an example:

Binary after (n-1) : 01111

Hash = 18, converts to binary: 10010

& after calculation: 00010

So I is equal to 2

After setting it to the NTH power of 2, the last bits of (n-1) can be guaranteed to be 1 when calculating the subscript position, which is convenient for ampersand operation, and the efficiency of ampersand is higher than that of % operation.

Use bitwise operators, that’s not a decoration. O O (studying studying)

If there is a key, then e will still get the key from the array index position in the HashMap. Value will be assigned at comment 4.2. That is, overwrite the old value and return the old value.

In other words: in a HashMap, there are no duplicate elements, and if multiple elements are stored on the same key, only the latest element is stored

hashMap.put("key1"."value1");
hashMap.put("key1"."value2");
System.out.println(hashMap);

// {key1=value2}
Copy the code
Switch to red-black tree to store data

In JDK1.8, HashMap is converted to a red-black tree when the list length is greater than 8. This is verified in comment 5.

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) ! =null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while((e = e.next) ! =null);
        if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code

We’ve already looked at the TreeNode class, which corresponds to the underlying structure diagram above

Also, we can see in comment 5.2 that if the storage has been converted to a red-black tree, then the red-black tree is inserted

Behind the red-black tree

resize()

Resize (), we know that when we initialize, we create an array, so what do we do when we look at the value

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;									
    int oldCap = (oldTab == null)?0 : oldTab.length;				// oldCap = 16
    int oldThr = threshold;											// oldThr = 12
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&		// newCap = 32
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold // newThr = 24
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

There’s a little chestnut in the notes. And we can see that

  • The container length increases exponentially
  • Judging the basis for container expansion, that is to say, the number of capacity expansion basis obtained by (container length * load factor) at the beginning will also increase exponentially, so we should pay attention here
Data migration

In JDK1.8 data migration is the result of judging: the current hash and the old container length:

  • If it’s 0, it’s still in the old place in the new array
  • If the result is not zero, the position in the new array is: the position in the original array + the container length of the original array

The specific logic is here:

// e.hash = 65
if ((e.hash & oldCap) == 0) {
    if (loTail == null)
        loHead = e;
    else
        loTail.next = e;
    loTail = e;
}
// e.hash = 6366 by calculating not 0, else
else {
    if (hiTail == null)
        hiHead = e;
    else
        hiTail.next = e;
    hiTail = e;
}

// The positions in the original array are the same as those in the new array
if(loTail ! =null) {
    loTail.next = null;
    newTab[j] = loHead;
}
// The position in the new array changes
if(hiTail ! =null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead;
}
Copy the code

So much for the key HashMap points, let’s summarize them

conclusion

  • The bottom layer of HashMap uses hash table + red-black tree structure to store, disordered storage. When the length of the hash list is greater than 8, the list will be transformed into a red-black tree, and the judgment benchmark is the conclusion obtained after verification by using poisson distribution. Then switch the structure of the red-black tree back to the linked list when the node of the red-black tree is less than 6.
  • The HashMap container is 16 by default, and if it needs to be expanded, it will be twice the size of the current container. In addition, when the capacity expansion condition is reached, the current container length * load factor, and in the subsequent capacity expansion process, the capacity expansion condition is doubled
  • We can also pass in a fixed parameter, but note that if we need to define the length of the container, it is best to define the length as 2 to the NTH power:
    • Because it’s used when you look up an array index through the hash of a key&Calculation, performance comparison%higher
  • At the same time, in the process of data migration, it is determined whether the hash value of the key is 0 after operation with the length of the original array. If the hash value is 0, the subscript position in the new array will not change; otherwise, it is equal to the current subscript position + the length of the original array, namely the index position in the new array
  • HashMapThread-unsafe class if you want to keep thread-safe:
    • Your lock
    • Collections.synchronizedMap()
    • The ConcurrentHashMap

The document

For more information on how to use HashMap, see the documentation:

HashMapAPI document