JDK1.7HashMap source analysis and summary

preface

HashMap is a data structure commonly used in daily learning or simple coding. It is also a knowledge point often asked in job interviews, and its importance is self-evident. From JDK1.7 to JDK1.8, Java developers have made a number of optimizations for HashMap (which will be covered later). Even thread-safe ConcurrentHashMap in the J.U.C package has some common ground with HashMap, so we should start with HashMap in JDK1.7. Learn more about ConcurrentHashMap and JDK1.8 HashMap.

Through this article, I will read JDK1.7 in the process of HashMap source code principles and knowledge points are summarized, as a response to the accumulation of autumn recruitment interview, but also hope to provide help for everyone like me is constantly learning xiaobai. If there is any mistake or unclear place, please comment and leave a message. I will modify or delete it as soon as possible.

Basic knowledge of

The essence of a HashMap can be viewed as a container for storing key-value pairs and their mapping relationships. The data structure underlying HashMap in jdk1.7 is array + linked list. When creating a HashMap, we can optionally pass in an initial capacity, but the number of elements a HashMap can store is not affected by this initial capacity. The HashMap keeps expanding its array, increasing its capacity. == The underlying data structure == is shown in the figure.

The HashMap== class diagram and its implementation interface relationship == are shown in the figure.

Next comes the data structure in the HashMap where == stores the element ==, as shown in the code snippet below.

static class Entry<K.V> implements Map.Entry<K.V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /** * Creates new entry. */
        Entry(inth, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; }}Copy the code

After understanding the above basic knowledge, the following will start from the source code, while looking at the source code while summarizing the knowledge points.

Constants and member variables

	// Default initial capacity, 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    // Maximum initial capacity, 2^30
    static final int MAXIMUM_CAPACITY = 1 << 30;
	
	// Default load factor, 0.75
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;
	
	// Initialize an empty array of tables by default
	//HashMap uses a lazy loading mechanism: when a HashMap is created, the table is initialized as an empty array, and only when it is used
	//, a non-empty array is created.
    static finalEntry<? ,? >[] EMPTY_TABLE = {};/** * The table, resized as necessary.length MUST Always be a power of two. But the length of the array must be an integer power of 2 * * *. * /
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

	// The number of key-value pairs stored in a HashMap, i.e. the number of entries stored
    transient int size;
	
	// The capacity expansion threshold, which also represents the initialCapacity when the HashMap was created
    int threshold;

    // Capacity expansion factor, related to HashMap capacity expansion,threshold = loadFactor * capacity
    final float loadFactor;

    /** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). */
	/** * modCount is used to count the number of structural changes to a HashMap, such as adding elements, deleting elements, etc. This variable is the fail-fast strategy for iterators in the HashMap collection view. When the iterator is created, modCount is assigned to a variable called expectedModCount. As the iterator is being used by the current thread *, the modCount and expectedModCount are constantly being checked for equality. If the two values are not equal, according to fail - fast strategy, will be * throw ConcurrentModificationException immediately, so as to realize not to let other threads to HashMap for structural modification. * Refer to the code for the inner class HashIterator. The * * Fail-fast policy is an error detection strategy, but it does not avoid errors. It is a mistake in Java collection mechanism, when more than one thread to modify * a collection at the same time, ConcurrentModificationException occurs. So in a concurrent environment, it is still recommended to use components under the * J.U.C package. * /
    transient int modCount;
Copy the code

Once you understand constants and member variables in HashMap, two questions remain:

Q1: Why must the length of a table be an integer power of 2?

Q2: What is a structural change to a HashMap?

To solve these two problems, we started reading the source code for the HashMap method.

methods

In this part, we not only explain one method, but also lead other methods related to one method and explain it.

A constructor

	// no argument constructor
	public HashMap(a) {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }
	// A constructor with one argument
	public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
	// Pass in the initial capacity and the expansion factor method
	public HashMap(int initialCapacity, float loadFactor) {
        // Check parameters
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
		
        this.loadFactor = loadFactor;
        // The initial capacity is recorded in threshold when the HashMap is created
        threshold = initialCapacity;
        // Empty method, used in LinkedHashMap
        init();
    }
	public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                      DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        inflateTable(threshold);

        putAllForCreate(m);
    }
Copy the code

Put method

The PUT method is one of the most complex and important methods in HashMap. The PUT method is used to add key-value pairs to a HashMap. If the Key to be added already exists in the HashMap, override the oldValue with the value passed in and return the oldValue; If it does not exist, it is added and null is returned.

	/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for  the key, the old * value is replaced. */
	// A method to add elements to the HashMap
	public V put(K key, V value) {
        // The table is initialized when the put method is first called
        if (table == EMPTY_TABLE) {
            / / create the table
            inflateTable(threshold);
        }
        // HashMap in jdk1.7 supports key-value pairs with null keys
        // If the key of the element to be put is null, the element is directly stored in the table[0] list
        if (key == null)
            return putForNullKey(value);
        // Lists the hash values based on the key hash
        int hash = hash(key);
        // Calculate the index I of the table to which the element should be inserted based on the hash value and the length of the table
        int i = indexFor(hash, table.length);
        // Find the same element in table[I] as the insert element key
        for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
            Object k;
            /** * Note how HashMap determines key equality: * e.h ash = = hash && ((k = e.k ey) = = key | | key. Equals (k)) * also use when computing the hash in the key of hashcode methods * * so, when the type of the key is the custom types, If you override equals, then you also override hashCode * to prevent the case where the keys are the same, but hashCode hashes differently */
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                returnoldValue; }}// Create an Entry and insert the key into the list
        // When adding an element to a HashMap, modCount needs to increment
        modCount++;
        / / add Entry
        addEntry(hash, key, value, i);
        return null;
    }
	// The putForNullKey method is used to insert a null key
	private V putForNullKey(V value) {
        // Select * from table[0] to see if there is an element with the same Key
        for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0.null, value, 0);
        return null;
    }
Copy the code

InflateTable method

The inflateTable method creates an array and initializes the table based on the initial capacity passed in when creating a HashMap, or the default initial capacity.

	// The method argument toSize is the initial capacity of the HashMap
	private void inflateTable(int toSize) {
        // roundUpToPowerOf2: Calculates a value capacity according to the initial capacity. Capacity is used as the length of the table
        // The value can be: capacity >= toSize and capacity is an integer power of 2
        int capacity = roundUpToPowerOf2(toSize);
	    // Recalculate the capacity expansion threshold: threshold = capacity * loadFactor
        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        // Create an array
        table = new Entry[capacity];
        initHashSeedAsNeeded(capacity);
    }
	private static int roundUpToPowerOf2(int number) {
        // assert number >= 0 : "number must be non-negative";
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1)? Integer.highestOneBit((number -1) < <1) : 1;
    }
	// This method is used to find the highest bit of I, for example, 9 corresponds to the base 2:1001, after the operation, find 1000
	public static int highestOneBit(int i) {
        // This method changes all the low values of I to 1 through multiple or operations. Finally, it moves right and subtracts, so that only the highest 1 is retained
        // For example, 1001, after five or operations, becomes 1111, the last step is 1111-0111 = 1000
        i |= (i >>  1);
        i |= (i >>  2);
        i |= (i >>  4);
        i |= (i >>  8);
        i |= (i >> 16);
        return i - (i >>> 1);
    }
Copy the code

The hash method

The hash method computes the hash value based on the key, which will be used to locate the indexFor inserting the linked list into the table.

	// The hash algorithm in a HashMap requires as high a hash as possible
	final int hash(Object k) {
        int h = hashSeed;
        if (0! = h && kinstanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();

        // Improve the hashing performance of the algorithm through multiple bit operations
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
Copy the code

IndexFor method

IndexFor calculates the index I inserted into the table based on the hash value and table length.

	static int indexFor(int h, int length) {
        /** * to calculate the subscript I, you can use the modulo % operation, also can use bitwise & operation, but the underlying computer operation is actually binary bitwise operation *, so the bitwise & operation will be more efficient. Q1: Why does the length of a table have to be an integer power of 2? * Because we use the bit & operation here to calculate the subindex I, if a bit of length - 1 is 0, * then the bit & operation on the bit must be 0, for example: length is 1011 * length - 1: 1010, *, when bitwise and operation, some positions in the array will never be accessed, resulting in a waste of space, and also increases the possibility of * hash conflicts. If length is a power of 2, then the element to be inserted by put can be hashed to the first position in the array. * /
        return h & (length-1);
    }
Copy the code

And by doing that, we solved Q1. Because when length is an integer power of 2, the element to be inserted has the same probability of being hashed to any position in the array, that is, it has the chance to be hashed to any position in the table, which can effectively utilize the array space and reduce the possibility of hash conflicts.

AddEntry method

When the addEntry method is executed, you need to check whether the array needs to be expanded before adding elements.

	void addEntry(int hash, K key, V value, int bucketIndex) {
        //jdk1.7 HashMap expansion conditions :(size >= threshold) && (null! = table[bucketIndex])
        The number of entries in the current HashMap >= threshold 2. The linked list of the position to be inserted is not empty
        // There are some differences between HashMap expansion conditions in JDk1.7 and 1.8.
        if ((size >= threshold) && (null! = table[bucketIndex])) {// The new array is twice as long as the original array
            resize(2 * table.length);
            hash = (null! = key) ? hash(key) :0;
            // Index needs to be recalculated after capacity expansion
            bucketIndex = indexFor(hash, table.length);
        }
		
        createEntry(hash, key, value, bucketIndex);
    }
	
Copy the code

Using this method, we can know the two conditions of HashMap expansion in JDK1.7, and the length of HashMap expanded array is twice of the original array.

The resize method

The resize method expands the HashMap and moves elements from the original table to the new table.

	void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
		// Create a new array
        Entry[] newTable = new Entry[newCapacity];
        // Move elements from the old table to the new table
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        // Recalculate the capacity expansion threshold
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
	// Transfer the element
	void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

Note that in a multithreaded environment, the transfer method can lead to an infinite loop of linked lists. The formation of the dead-loop linked list is illustrated at the end of this article.

CreateEntry method

The createEntry method is the one that actually creates the Entry and inserts the list. This method inserts the newly created Entry into the linked list by header interpolation. In jdk1.7, HashMap inserts are header inserts. Java developers assume that the newly inserted Entry will be accessed more often, so to facilitate future access, they insert the newly added element into the head of the list. However, this insertion strategy may cause the dead-loop linked list in the transfer method after expansion, so tail insertion method is changed in JDK1.8.

	void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }
Copy the code

The other method behind is relatively simple, will not make too much elaboration, posted to facilitate future reference.

Get related methods

The get-related methods internally delegate the specific GET operations to the getEntry method.

	public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }
	private V getForNullKey(a) {
        if (size == 0) {
            return null;
        }
        for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }
	public boolean containsKey(Object key) {
        returngetEntry(key) ! =null;
    }
	final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        int hash = (key == null)?0 : hash(key);
        for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
             e = e.next) {
            Object k;
            if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                return e;
        }
        return null;
    }
Copy the code

The size method

The size method returns the number of entries stored in the HashMap. After looking at the method code, you can easily distinguish between the length() method of a string, the length of an array, and the size() method of a collection.

	public int size(a) {
        return size;
    }
Copy the code

Remove related methods

Operations related to remove and clear may cause structural changes to the HashMap, and the modCount value will increase automatically.

	public V remove(Object key) {
        Entry<K,V> e = removeEntryForKey(key);
        return (e == null ? null : e.value);
    }
	final Entry<K,V> removeEntryForKey(Object key) {
        if (size == 0) {
            return null;
        }
        int hash = (key == null)?0 : hash(key);
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];
        Entry<K,V> e = prev;

        while(e ! =null) {
            Entry<K,V> next = e.next;
            Object k;
            if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }
	final Entry<K,V> removeMapping(Object o) {
        if (size == 0| |! (oinstanceof Map.Entry))
            return null;

        Map.Entry<K,V> entry = (Map.Entry<K,V>) o;
        Object key = entry.getKey();
        int hash = (key == null)?0 : hash(key);
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];
        Entry<K,V> e = prev;

        while(e ! =null) {
            Entry<K,V> next = e.next;
            if (e.hash == hash && e.equals(entry)) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }
	// You can learn this trick from everyday coding when you quickly populate Arrays
	public void clear(a) {
        modCount++;
        Arrays.fill(table, null);
        size = 0;
    }
Copy the code

To sum up, the put and remove methods may cause structural changes to the HashMap, resulting in modCount increment.

conclusion

Adding elements to a HashMap

1. Determine whether you need to initialize the array and create it if you do

2. Check whether the key is empty. If the key is empty, the table[0] list will be traversed to find an Entry with a null key, and oldValue will be overwritten with newValue and returned. Otherwise, insert directly into the head of the list

2. Invoke the hash method to calculate the hash value using the passed key as the method parameter

3. Perform bitwise and operation on hash value and (length-1) to calculate the subscript I of the inserted linked list in table

Select * from table[I] where oldValue = newValue and oldValue = newValue

5. Determine whether expansion is required. If expansion is required, expand the array and transfer the elements from the original array to the new array

6, according to the hash and (new array length -1) bit and find the newIndex newIndex

7. Create an Entry object and insert it into the head of the table[newIndex] list

Illustration of dead-loop linked lists:

At a certain point, the table of the HashMap in main memory and the nodes A, B, and C under a linked list are shown in the figure.

Assume that two threads ThreadA and ThreadB need to insert nodes to positions A, B, and C, and capacity expansion conditions are met.

On the first CPU scheduling, ThreadA gets the CPU time slice, and ThreadA copies a copy of the table in main memory into ThreadA’s working memory. When e points to table[I] in the Transfer method for loop, then the while loop is executed until “Entry<K,V> next = E.ext;” Next points to B. If ThreadA runs out of CPU time slices, any changes made to the shared variable in ThreadA’s working memory are synchronized to the main memory. However, no changes were made to the table in this task, so the main memory shared variable remains unchanged.

On the second CPU scheduling, ThreadB gets the CPU time slice, and ThreadB copies a copy of the table in primary memory into ThreadB’s working memory. After ThreadB has finished transferring nodes from table[I] to newTable[I], ThreadB’s working memory table and newTable are shown in the figure.

If the CPU time slice obtained by ThreadB is used up before newTable is assigned to table, the changes to table and a, B, and C in the working memory are synchronized to the main memory. At this point, the pointing relationship between table and A, B, and C in the working memory is the same as that in ThreadB’s working memory.

On the third CPU schedule, ThreadA gets the CPU time slice. The ThreadA working memory at this point looks like the figure below. At this point, E points to node A and next points to node B.

First insert the section nod pointed to by e into newTable[I] and assign next to e. As shown in the figure.

When the loop inside the while is repeated,next = e.ext, e points to B,next points to A, insert the e header into newTable[I], and assign next to E, as shown in the figure.

The next time next = e.next, next points to null, inserts e into newTable[I], and assigns next to e. The next time the loop enters, e==null, and the table[I] transition is complete. Table and newTable are shown in the figure.

When ThreadA runs out of CPU time slices, the table in working memory is synchronized to main memory. In this case, a ring is formed between node A and node B, and node C cannot be accessed, resulting in data loss of node C. When a HashMap method traverses the table[I] list, it creates an infinite loop due to the existence of a loop.