###hashcode&hashmap

Programming Hashcode according to hash functions; Many to one, the same input is the same output, different input is evenly distributed, hash conflict may occur, the solutions are chain address method, open address method, (linear in exploration, square in exploration), rehash, establish a public overflow area.

### Big data

1) A 100T large file contains a lot of strings, print out the repeated strings (use hash function to split, read data by row, hash, the same data will be allocated to a machine, on the above operation, parallel tasks will be very fast)

2) Find a number that does not appear in 4 billion numbers (bitmap + bucket sort)

### Bloom filter

Bit type Map) int =4 bytes =32 bits

Given a single machine, there are 10 billion urls (64 bytes) that are blacklisted, and now given a URL to determine whether it is on the blacklist again. If you hash, it takes up a lot of space.

int[] arr =new int[1000]; //32000 int index =30000; int intIndex =30000/32; int bitIndex =30000%32; arr[intIndex] = (arr[intIndex]|(1<<bitIndex));Copy the code

To save space, prepare k hash functions, hash the URL in modulus length, and then stroke black. The longer the array, the lower the error rate.

hashtable &hashmap

Hashmap: Inherit map class, not thread safe, null can be used as a key but only one, the default length is 16, expansion mode 2x, implement serializable interface, hash value needs to be modulo

Thread safety: If thread A and thread B perform the PUT operation at the same time, and the hash value of the two data is the same and the data is null, thread A and thread B both perform the insert operation. Assume that thread A is suspended before data insertion after entering, thread B performs normally and inserts data normally. Then thread A obtains the CPU time slice, and thread A does not need to make hash judgment any more. The problem occurs: Thread A overwrites the data inserted by thread B, causing thread insecurity.

Hashtable: inherit dictionary class, each method is added Synchron keyword, key, value are not allowed null, the default length is 11, expansion mode 2x+1, realize serialization, directly use hash value without modulus.

# # # hashmap explanation

The thread safety of jdk1.7hashmap is caused by inconsistent put operations and an infinite loop when copying the original list in reverse order during expansion.

In jdk1.8, if the list length is greater than 8, it will be changed to a red-black tree structure. Since the length is always traversed to the end of the list, it is no longer in reverse order.

# # # concurenthashmap explanation

Jdk1.7 segmented lock

ConcurrentHashMap consists of a Segment array structure and a HashEntry array structure. Segment is a ReentrantLock that acts as a lock in ConcurrentHashMap, while HashEntry is used to store key-value pairs. A ConcurrentHashMap contains an array of segments. The structure of a Segment is similar to that of a HashMap. A Segment contains an array of Hashentries. Each Segment contains an element in the HashEntry array. When modifying the HashEntry array, the Segment lock must be obtained.

Idk1.8 ConcurrentHashMap

Form one: zipper table structure is the simplest case. To ensure thread-safety, all reads and writes to this structure need to be volatile, atomic, or CAS (CAS is short for CompareAndSwap, which means compare and replace). CAS requires three operands: memory address V, old expected value A, and target value B to be updated when the CAS instruction executes, change the value of memory address V to B if and only if the value of memory address V is equal to the expected value A, otherwise nothing is done. The entire compare and replace operation is an atomic operation. Inserting the first Node into each empty bin of the table (bin refers to each element in the array) is done using the CAS operation. Second, the first Node in each bin acts as a synchronized lock. Protect thread safety for subsequent insert, update, and remove operations. If the hash values of the key-value mapping are mapped to the same bin, the insert, delete, and modify operations require a lock. The lock is the head node of each bin list and uses the Synchronize lock.

Form 2: Not when the length of the linked list in bin is long enough, it will be converted into a red-black tree immediately, but only when the length of the table array reaches a certain threshold. If a bin is too long but the table array is too short, resize is triggered first. The array table can store nodes, TreeBin, and ForwadingNodes.

1) The variables associated with get are volatile, so there is almost no need to consider thread safety and variable visibility, just atomicity. By atomicity, I mean that table variables and node attributes that are shared by multiple threads in a GET operation may also be modified by other threads, so you need to save them to local variables and use them later so that you don’t have to worry about being modified by other threads.

2) Put operation, check first

1. If the location is null, the CAS operation is used to ensure thread safety. 2. If the location is ForwardingNode, perform a partial transfer and retry in the new table. 3. Lock only if the location is a single-linked list or red-black tree.Copy the code

Synchronized locks are used instead of reentrantLocks because the JVM has many optimizations for synchronized, and in the absence of contention, Synchronized performs better than explicit ReentrantLock (such as lock elimination, biased locking, spin, and other optimization measures).