HashMap is a collection container that senior java engineers in the front line must be proficient in. Its importance is almost equal to the importance of Volatile in concurrent programming (visibility and order). This article makes a detailed explanation of the graphic source code to deeply analyze the important kernel knowledge of HashMap, which is easy to read, learn and understand. It is recommended to collect and learn more, in case of being asked in the interview.
I'm Mike, who has spent more than 10 years teaching the architecture technology of BAT firstline large factories.
Key points of this article:
1. Data structure of HashMap
2. Core members of HashMap
3. Node array of HashMap
4. Data storage of HashMap
5. Hash function of HashMap
6. Hash conflict: linked hash table
7. get method of HashMap: hash function
8. put method of HashMap
9. Why must 2^n be used for slot number?
Data structure of HashMap
Firstly, from the perspective of data structure, HashMap is the data structure of array + linked list + red black tree (red black tree is added to JDK1.8), as shown below:
Here we need to understand two questions:

What is the underlying storage of data?

What are the advantages of this storage method?
1. Core members
Default initial capacity(Array default size):16，2 Integer power of static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; Maximum capacity static final int MAXIMUM_CAPACITY = 1 << 30; Default load factor static final float DEFAULT_LOAD_FACTOR = 0.75f; Load factor is used to measure HashMap The degree of fullness indicates that when map The data stored in the collection reaches 75% of the current array size%Capacity expansion is required Linked list to red black tree boundary static final int TREEIFY_THRESHOLD = 8; The red black tree turns away from the boundary of the linked list static final int UNTREEIFY_THRESHOLD = 6; Hash bucket array transient Node<K,V>[] table; Number of elements actually stored transient int size; When map The data inside is larger than this threshold It will be expanded int threshold threshold = table.length * loadFactor
2.Node array
It can be seen from the source code that there is a very important field in the HashMap class, that is, Node[] table, that is, hash bucket array. Obviously, it is an array of nodes.
static class Node<K,V> implements Map.Entry<K,V> { final int hash;//Used to locate the array index position final K key; V value; Node<K,V> next;//The next Node in the linked list Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }
Node is an internal class of HashMap, which implements map The entry interface is essentially a mapping (key value pair).
Data storage of HashMap
1. Hash table to store
HashMap uses a hash table to store data.
Hash table (also known as hash table) is a data structure accessed directly according to the key value. As long as you enter the value to be found, that is, key, you can find its corresponding value.
Hash table is actually an extension of array, which evolved from array. It can be said that if there is no array, there is no hash table.
2. Hash function
The element in the hash table is determined by the hash function. The Key key of the data element is taken as the independent variable. Through a certain functional relationship (called hash function), the calculated value is the storage address of the element.
Expressed as: Addr = H (key), as shown in the following figure:
The design of hash function in hash table is very important, which is also one of the key problems in the process of building hash table.
3. Core issues
Two main problems need to be solved before establishing a hash table:
1) Construct an appropriate hash function, and the values of uniformity H (key) are evenly distributed in the hash table
2) Conflict handling
Conflict: in a hash table, different keyword values correspond to the same storage location.
4. Hash conflict: linked hash table
In order to solve the conflict of hash table, address method and chain address method can be used to solve the problem. HashMap in Java adopts chain address method.
Chain address method, in short, is the combination of array and linked list, as shown in the following figure:
Hash function of HashMap
/** * Recalculate hash value */ static final int hash(Object key) { int h; // h = key.hashCode() is the first step to get the hashCode value // H ^ (H > > > 16) is the highorder participation operation in the second step return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
//Calculate array slot
(n  1) & hash
hashCode the key to obtain a 32bit int value H, and then use h XOR H > > > 16 bits. At jdk1 In the implementation of 8, the algorithm of highorder operation is optimized, which is realized by the high 16 bits of hashCode() (H = k.hashCode()) ^ (H > > > 16).
The advantage of this is that the highorder and loworder values of hashcode can be mixed for XOR operation, and after mixing, the highorder information is added to the loworder information, so that the highorder information is retained in disguise.
It means that the high 16 bits of hash are also involved in the calculation of subscript. If there are more doped elements, the randomness of the generated hash value will increase and the hash collision will be reduced.
remarks:

^XOR: 1 for different and 0 for the same

>>>: move right without symbol: fill 0 on the right

&Operation: if two digits are "1" at the same time, the result will be "1", otherwise it will be 0
H & (table. Length  1) to get the save bit of the object, and the length of the underlying array of HashMap is always the nth power of 2.
Why must 2^n be used for the number of slots?
1. To make the hash result more uniform
If the number of slots is not 16 but 17, the slot calculation formula becomes: (17 – 1) & hash
As can be seen from the above, the calculation results will converge greatly. After hashcode participates in the & operation, it is shielded by more zeros, and there are only two calculation results: 0 and 16, which is a disaster for hashmap. 2. Equivalent to length modulus
When length is always to the nth power of 2, H & (length1) operation is equivalent to modulo length, that is, h%length, but & is more efficient than%.
Bit operation is more efficient than arithmetic operation because arithmetic operation will still be converted into bit operation.
The ultimate goal is to make the hash result more uniform, reduce hash collision and improve the operation efficiency of hashmap.
How to analyze HashMap:
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // When the array of the current object is null or the array length is 0, the array needs to be initialized if ((tab = table) == null  (n = tab.length) == 0) { n = (tab = resize()).length; } // Use the hash and the value of the array length minus one to XOR to obtain the scattered array subscript, which indicates that the current // The key will be stored in this location. If there is no value in this location, you can directly create a new kv node to store it // Where length n is a power of 2 if ((p = tab[i = (n  1) & hash]) == null) { tab[i] = newNode(hash, key, value, null); } // If you go to else, it indicates that there is already content in the array position indexed by the key, that is, there is a collision // At this time, we need more complex ways to deal with collisions, such as linked lists and trees else { Node<K,V> e; K k; //Node key exists, directly overwriting value if (p.hash == hash && ((k = p.key) == key  (key != null && key.equals(k)))) { e = p; } // Judge that the chain is a red black tree else if (p instanceof TreeNode) { // Where this represents the current HashMap and tab is the array in the map e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); } else { // Judge the chain as a linked list for (int binCount = 0; ; ++binCount) { // If the current collision node has no subsequent nodes, create a new node directly and append it if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // TREEIFY_THRESHOLD = 8 // Starting from 0, if it reaches 7, it means it is full of 8. At this time, it needs to be transferred // Re determine whether to expand capacity or switch to red black tree if (binCount >= TREEIFY_THRESHOLD  1) // 1 for 1st treeifyBin(tab, hash); break; } // If a node with exactly the same key among the collision nodes is found, the old node is replaced with the new node if (e.hash == hash && ((k = e.key) == key  (key != null && key.equals(k)))) break; p = e; } } // At this time, e is the saved collided node, that is, the old node if (e != null) { // existing mapping for key V oldValue = e.value; // onlyIfAbsent is the call parameter of the method, indicating whether to replace the existing value, // In the default put method, this value is false, so the old value will be replaced with the new value if (!onlyIfAbsent  oldValue == null) e.value = value; // Callbacks to allow LinkedHashMap postactions afterNodeAccess(e); return oldValue; } } // map changeability operation counter // For example, the structural changes of map, such as content increase or decrease or rehash, will directly lead to the concurrency of external map // Iteration causes a fail fast problem, and this value is the basis of comparison ++modCount; // size is the number of kv included in the map // Capacity expansion when the maximum capacity is exceeded if (++size > threshold) resize(); // Callbacks to allow LinkedHashMap postactions afterNodeInsertion(evict); return null; }
The overall execution process of the put method of HashMap is as follows:
①. Judge whether the key value is empty or null for the array table[i]. Otherwise, execute resize() to expand the capacity;
②. Calculate the hash value according to the key value to the inserted array index I. If table[i]==null, directly create a new node to add
③. Judge whether the first element of table[i] is the same as key. If it is the same, directly overwrite value
④. Judge whether table[i] is a treeNode, that is, whether table[i] is a red black tree. If it is a red black tree, directly insert key value pairs into the tree
⑤. Traverse the table[i] to determine whether the length of the linked list is greater than 8. If it is greater than 8, convert the linked list into a red black tree and perform the insertion operation in the red black tree. Otherwise, perform the insertion operation of the linked list; If the key is found to exist during traversal, you can directly overwrite the value;
⑥. After successful insertion, judge whether the actual number of key value pairs exceeds the maximum capacity threshold. If so, expand the capacity.
HashMap summary
HashMap underlying structure? Based on the implementation of Map interface and the structure of array + linked list, red black tree is added after JDK 1.8. The length of linked list > 8 becomes red black tree and < 6 becomes linked list
What happens when two objects have the same hashcode? Hash conflicts. HashMap solves hash conflicts through linked lists
What are the functions of equals() and hashCode() in HashMap? When adding and obtaining a HashMap, you need to hash() through the hashCode() of the key, and then calculate the subscript (n1 & hash) to obtain the same location to be found. In case of conflict (collision), use the key.equals() method to find the corresponding node in the linked list or tree
When will HashMap be expanded? When the element of put reaches the capacity multiplied by the load factor, it defaults to 16 * 0.75
Implementation of hash? h = key. hashCode()) ^ (H > > > 16). The hashCode shifts 16 bits to the right without symbols, and then performs bitwise XOR to obtain the hash value of this key. Since the capacity of the hash table is the nth power of 2, at present, the lower and lower bits of the element's hashCode () are the same in many cases, This will lead to conflict (collision). Therefore, after 1.8, a shift operation is performed: the hashCode () of the element is different or different from the result after shifting itself to the right by 16 bits
Is HashMap thread safe? HashMap has high reading and writing efficiency, but because it is asynchronous, that is, reading and writing operations are not protected by locks, it is unsafe in multithreaded scenarios and prone to data inconsistency. It is highly recommended in single threaded scenarios.
The above is the introduction of HashMap.
END
I'm Mike, who has spent more than 10 years teaching the architecture technology of BAT firstline large factories. Every indepth technical article shared by Mike takes 25 days to create meticulously, often tossing until 12 a.m. I just hope you can see it. If you think it's OK, you can easily [like + forward + comment] with the support of one button and three times. Thank you.