HashMap notes (1)

HashMap

Class first

public class HashMap<K.V> extends AbstractMap<K.V>
    implements Map<K.V>, Cloneable.Serializable {
Copy the code

structure

Array sleeve linked list/tree implementation

Array index: hash & (arr.size-1)

Unfiled & shorthand

F: 0.75 stackoverflow.com/questions/1…

(size – 1) & hash

hash&OldCap

conclusion

In fact, the red-black tree part was added after 1.8

The part of the red-black tree actually has the front and back Pointers, which can be easily changed back to the list structure
The operation of red-black trees is basically divided into two steps:
1. Perform related operations as BST
2. After the BST operation, the relevant balancing operation will be carried out

2³⁰The maximum value of:

Capacity expansion conditions:

Expansion is directly double under normal conditions
In tree-tree condition, the table is too small (<=64), so resize will be used as tree-tree instead
Current volume less thanthresholdCan resize
1. Threshold: size * loadFactor
2. LoadFactor: loadFactor
3. Size: the number of nodes in the Map, not the number of non-empty nodes in the table array

Fail – fast:

Interfaces to be added:
1. PutVal, removeNode, Clear, etc., etc
Interfaces that will be checked:
1. ForEach in keySet, forEach in Values, forEach in EntrySet, forEach itself, replaceAll
So one conclusion can be drawn:
1. The implementation of HashMap for fail-fast mechanism is limited to checking in foreach. If HashMap is used as a black box, there is no check for Fail-fast; The implementation of fail-fast is included in subclasses and added later in 1.8.

Thread safety

In the case of resize, circular linked lists may be generated (1.7)

In 1.7, looped lists (blog.csdn.net/sinat_40482…

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                // The main problem is here
                e.next = newTable[i];
                 [I] = e [I] = e
                newTable[i] = e;
                e = next;
                //2.B runs here, A starts again}}}Copy the code

Hint:

Simplify the hash algorithm for % mod
Now assume that the bucket array of HashMap has a length of 2, with 3, 5, and 7 in table[1]

graph LR A[1] --> C[?]  B[3] --> D[5] --> F[7]

At this point, the newHash operation becomes the following bucket:
```
graph LR A[?]  B[?]  A1[?]  A2[?]
```
The fourth bucket should be:
```
graph LR
B[3] --> D[7]
```
At this point, thread A executes to:
```
e.next = newTable[i];
Copy the code
```
```
graph LR
A[3]-->A1[newTable.i]
```
Suspends without attaching the new node to the head, thread B takes the time slice, executes the entire process, and becomes:
```
graph LR A[?]  --> C[?]  B[3] --> D[?]  A1[?] --> A11[?]  A2[7]-->A21[3]
```
Then A starts executing, changing to 3->7 and concatenating to the table array:
```
graph LR
A2[3]-->A21[7]
```
```
graph LR A[?]  --> C[?]  B[?]  --> D[?]  A1[?] --> A11[?] A21[7]-->A22[3-B thread finished processing]
```
In the fourth bucket, the first 3 and the last 3 are the same node object

So the next node of 3 also becomes 7:
```
graph LR

A2[3]-->A21[7]-->A2
```
At this point, there is a circular linked list, causing an endless loop

In 1.8, due to:

if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // If you spell hi, spell j+cap
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
Copy the code

In resize, linked lists are bound directly to the table, so there is no problem with circular lists as in 1.7.

But whether 1.7 or 1.8, there will be data loss issues:
1. If the value of the newly inserted node is inconsistent, the direct concatenation method in 1.8 May overwrite the value changed in another thread
2. PutVal:
```
tab[i] = newNode(hash, key, value, null);/ / line 631 (1.8)
Copy the code
```
  If A is suspended here and B is suspended here in the same hash-bin, then B’s value will be overwritten.

Note – Explains TreeNode

In the source code of hashMap, a lot of implementation notes are written before the private variables. Look at it paragraph by paragraph:

Paragraph 1 – The underlying structure may change

This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node).  Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
Copy the code

The first paragraph says:

A hashMap starts out as a hash table of buckets.
If the bucket is too large, it changes the bin into the shape of a tree node, and the structure is similar to that of TreeMap.
Most of the methods use the usual bin, but TreeNode is used if the structure becomes TreeNode.
- The difference in this step is determined by instanceof
TreeNode nodes can be traversed and used just like any other node, but in large numbers, the query efficiency is faster.
- However, most bins will not have a lot of data in their usage scenarios, so there may be a delay in checking the existence of treeBin in the table method.

The bottom line is, they change depending on their size; It’s good to change.

The second paragraph. – What did it become?

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable<C>", type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate  this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)Copy the code

HashCode and sorting after Tree bin transformation

Mostly with hashCode. But if it is equal, if the stored objects inherit the Comparable method, this method is called to sort.
This is useful: operations can provide worst-case O(logn) complex reads with differentiated hash values or ordered ones.
Therefore, the hash algorithm for storing objects is critical to the performance of the container.
- If you have a hashCode that is not sufficiently fragmented and is mostly the same, performance will deteriorate even if you all implement the Comparable interface. (The specific performance is not said, probably not)
- Not Comparable, hashCode is not Comparable:
  - It’s going to take about twice as much time and space to do that as it would if you didn’t have to do that.
  - But the people who can do this kind of code, from the point of view of the people who wrote this note, this performance drop makes little difference.

Paragraph three. – When will it change

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a Parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, The expected occurrences of list size k are (exp(-0.5) * POw (0.5, k)/factorial(k)). The first values are: 0: 0.60653066 1: 0.30326533 2: 0.07581633 3: 0.01263606 4: 0.00157952 5: 0.00015795 6: 0.00001316 7: 0.00000094 8: 0.00000006 More: less than 1 in ten millionCopy the code

This section covers TreeNode and bin conversions.

TreeNode is twice the size of a normal node, so the bin will only change things if it contains enough nodes.
- Control through TREEIFY_THRESHOLD
- If bin becomes smaller, it will revert back to the original simple bin
TreeNode is rarely used if you use a well-distributed hashCode generation method
Here’s the ideal case:
- When the parameter is: In the case of resizing threshold = 0.75, the number of nodes in bin conforms to the Following Poisson distribution (not that 0.75 only conforms to the Poisson distribution, but that threshold = 0.75 conforms to the following Poisson distribution with parameter 0.5). LoadFactyory for other lambda will become the value of the other resources: blog.csdn.net/reliveIT/ar…
- The expected bin listSize is K, when the total data within the container is the maximum threshold for resizing. Hashmap.table []. Length = 0.75*length);

(e^{-0.5} \times 0.5^{k}) \div k!

If k is greater than 8, the probability is less than 1 in 10 million.

As for why 0.75F is used by default, the header comment says:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries  divided by the load factor, no rehash operations will ever occur.Copy the code

It’s a compromise between time and space. If this value is bigger, it’s more efficient but it’s slower to find; If the initialization size is greater than Max (Count (Entry)) /loadFactor, no rehash operation will occur.

Segment 4 – the root of the tree node

The root of a tree bin is normally its first node.  However,
sometimes (currently only upon Iterator.remove), the root might
be elsewhere, but can be recovered following parent links
(method TreeNode.root()).
Copy the code

The root of a tree node is usually the first node, but sometimes it can be something else

But you can change it back.

Paragraph 5 – Internal reference

All applicable internal methods accept a hash code as an argument (as normally supplied from a public method), allowing them to call each other without recomputing user hashCodes. Most internal methods also accept a "tab" argument,  that is normally the current table, but may be a new or old one when resizing or converting.Copy the code

Internal methods may pass in a hashCode and a snapshot of the table array as arguments.

Paragraph 6 – Sequential correlation

When bin lists are treeified, split, or untreeified, we keep
them in the same relative access/traversal order (i.e., field
Node.next) to better preserve locality, and to slightly
simplify handling of splits and traversals that invoke
iterator.remove. When using comparators on insertion, to keep a
total ordering (or as close as is required here) across
rebalancings, we compare classes and identityHashCodes as
tie-breakers.
Copy the code

When bin is tree-tree, split, or de-tree-tree, the relative fetch/traversal order is preserved; To simplify partitioning and traversal, the iterator’s remove method is used.
When using a comparator at insert time, classes and identityHashCodes are used for comparison to ensure that the population is in order during rebalancing.

Paragraphs 7 and 8 – little things after that

The use and transitions among plain vs tree modes is
complicated by the existence of subclass LinkedHashMap. See
below for hook methods defined to be invoked upon insertion,
removal and access that allow LinkedHashMap internals to
otherwise remain independent of these mechanics. (This also
requires that a map instance be passed to some utility methods
that may create new nodes.)
The concurrent-programming-like SSA-based coding style helps
avoid aliasing errors amid all of the twisty pointer operations.
Copy the code

It mainly talks about the following things:

Because of the subclass LinkedHashMap, common and tree-like use and transformation are more complex.
- The following is a definition of hook methods used in insert, delete, and retrieveLinkedHashMapCan have relatively independent values in these mechanisms.
- This also requires passing an instance of the Map to some utility method that may generate a new node.
Concurrent programming such as sSA-based coding styles helps to avoid aliasing errors in all contorted pointer operations. (machine)
- Here are a few things:
  - Ssa-based: a large number multiplication algorithm. See: baike.baidu.com/item/SSA/18…
  - “Aliasing errors”
- This class avoids aliasing errors when dealing with Twisty pointer operations by using sSA-based, concurrent programming-like code style.

Internal variables

Default initialization size:16
- The magnitude has to be a power of two

//The default initial capacity - MUST be a power of two.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

Copy the code

Maximum capacity: 1<<30 that is, 230

/** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two < = 1 < < 30. * /
static final int MAXIMUM_CAPACITY = 1 << 30;
Copy the code

The default load primer is 0.75F

/** * The load factor used when none specified in constructor. */
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
Copy the code

The tree threshold is 8. The note above says why is it 8

/** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */
static final int TREEIFY_THRESHOLD = 8;
Copy the code

Non-tree threshold 6, these are still different

/** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. */
static final int UNTREEIFY_THRESHOLD = 6;
Copy the code

The minimum tree volume is 64. If it is smaller than this, the tree operation in bin will not be carried out

/** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. */
static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

Managed array

/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) */
transient Node<K,V>[] table;
Copy the code

I don’t know what

/** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */
transient Set<Map.Entry<K,V>> entrySet;
Copy the code

Size, commonly used trick, this way to get size is always O(1) operation
Size refers to the number of pairs of k and V stored, not table.size

/** * The number of key-value mappings contained in this map. */
transient int size;
Copy the code

ModCount, a common number in asynchronous containers, is used to implement the fail-fast mechanism

/** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). */
transient int modCount;
Copy the code

The size of the container’s next resize

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial* /
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
Copy the code

Load the lead without setting the default value

/**
 * The load factor for the hash table.
 *
 * @serial* /
final float loadFactor;
Copy the code