This paper analyzes the hash() function and underlying data structure of HashMap, as well as common methods and common interview related questions
1. Introduction to HashMap
HashMap is a common collection class of K and V key value pairs, which implements the Map interface.
jdk1. Before 8, HashMap was implemented in the way of array + linked list, which stores data with conflicting key values.
jdk1.8. It is implemented in the way of array + linked list / red black tree. After the following two conditions are met, the linked list to red black tree operation will be executed to speed up the search speed.
- The length of the linked list is greater than the threshold (8 by default)
- HashMap array length exceeds 64
2. underlying data structure of HashMap
Properties of class
public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable { // The default initial capacity is 16 static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // Maximum capacity static final int MAXIMUM_CAPACITY = 1 << 30; // Default fill factor static final float DEFAULT_LOAD_FACTOR = 0.75f; // When the number of nodes on the bucket is greater than this value, it will turn into a red black tree static final int TREEIFY_THRESHOLD = 8; // When the number of nodes on the bucket is less than this value, the tree turns to the linked list static final int UNTREEIFY_THRESHOLD = 6; // The structure in the bucket is transformed into the minimum size of the table corresponding to the red black tree static final int MIN_TREEIFY_CAPACITY = 64; // The size of an array of storage elements is always a power of 2 transient Node<k,v>[] table; // A set that holds concrete elements transient Set<map.entry<k,v>> entrySet; // The number of stored elements. Note that this is not equal to the length of the array. transient int size; // Counter for each expansion and change of map structure transient int modCount; // Critical value when the actual size (capacity * filling factor) exceeds the critical value, capacity expansion will be carried out int threshold; // Loading factor final float loadFactor; }
Some inner classes
- EntrySet, entryitterer, entrysplitterkey Set iterator separator s
- KeySet, KeyIterator, keysplitterset iterator separator
- Values, ValueIterator, valuesplitter value Set iterator separator
use
public class TestMap { public static void main(String[] args) { HashMap<String, String> test = new HashMap<>(); System.out.println(test.size()); test.put("A","AAA"); test.put("B","BBB"); test.put("C","CCC"); test.put("D","DDD"); test.put("E","EEE"); Iterator<Map.Entry<String, String>> iterator = test.entrySet().iterator(); while (iterator.hasNext()){ System.out.println(iterator.next()); } Iterator<String> iterator1 = test.keySet().iterator(); while (iterator1.hasNext()){ System.out.println(iterator1.next()); } Spliterator<String> spliterator = test.values().spliterator(); System.out.println(spliterator.estimateSize()); spliterator.forEachRemaining(System.out::println); System.out.println(test.size()); } }
Node class
static class Node<K, V> implements Entry<K, V> { final int hash; //Hash value, which is used to compare with the hash value of other elements when storing elements. final K key; // Key: the only key is a value. One key is allowed to be null V value; // value HashMap.Node<K, V> next; //Point to the next Node Node(int var1, K var2, V var3, HashMap.Node<K, V> var4) { this.hash = var1; this.key = var2; this.value = var3; this.next = var4; } public final K getKey() { return this.key; } public final V getValue() { return this.value; } public final String toString() { return this.key + "=" + this.value; } public final int hashCode() { return Objects.hashCode(this.key) ^ Objects.hashCode(this.value); } public final V setValue(V var1) { Object var2 = this.value; this.value = var1; return var2; } public final boolean equals(Object var1) { if (var1 == this) { return true; } else { if (var1 instanceof Entry) { Entry var2 = (Entry)var1; if (Objects.equals(this.key, var2.getKey()) && Objects.equals(this.value, var2.getValue())) { return true; } } return false; } } }
TreeNode source code
// Some methods are not posted here static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // Red black tree parent node TreeNode<K,V> left; TreeNode<K,V> right; TreeNode<K,V> prev; // Unlinking is required after deletion boolean red; TreeNode(int hash, K key, V val, Node<K,V> next) { super(hash, key, val, next); } }
Array + linked list
Node<k,v>[] table
Array + red black tree
3. Common methods of HashMap
Constructor
/** * Construct an empty HashMap with a specified initial capacity and load factor. */ public HashMap(int initialCapacity, float loadFactor) { //If the initialization size is less than 0, an exception is thrown if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); //The maximum value of table in HashMap is 2 ^ 30. If the initialization size is greater than 2 ^ 30, it is 2 ^ 30 if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); } /** * The custom initial capacity uses the default load factor (0.75) to construct an empty HashMap. */ public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } /** * Construct an empty HashMap using the default initial capacity (16) and the default loading factor (0.75). */ public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted }
-
Filling factor: loadFactor indicates the size of the filling factor. Let's briefly introduce the filling factor: assuming that the size of the array is 16, each element in the array is mod 9, and the position of all elements after taking the mold is (0 – 9). At this time, the size of the filling factor is 9 / 16, and the filling factor is 0.75.
-
The HashMap initialization process is to create a new array with the size of capacity and the type of node. Node has introduced this class above, including a pointer, a key, a value, and a hash. Capacity is the power of 2. As for why it is the power of 2, it will be introduced later.
hash function
static final int hash(Object key) { int h; //Use the high 16 bits and low 16 bits of the hashCode of the key to perform XOR operation, and put the result in the low 16 bits return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
Because both put and get of hashmap need to sum the key with the array size length -1 after being processed by the hash function, and the size of the array usually does not exceed 2 ^ 16, so the lower 16 bits always participate in the operation. Therefore, XOR the upper 16 bits and lower 16 bits of the hashCode of the key, and the value obtained will have the characteristics of hashing.
put function
public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; //The table is uninitialized or the length is 0 for capacity expansion if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // Determine the stored subscript. If it is null at this time, the new Node will be put into the position of the subscript of the current array. if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); //An element already exists at the current position of the array else { Node<K,V> e; K k; //The hash of the current position p of the array is equal to the hash of the element to be inserted and the key is equal if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) //The current position p is assigned to e e = p; // hash values are different, that is, key s are different, and p is a red black tree node else if (p instanceof TreeNode) //Insert red black tree e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); //Is a linked list node else { //Insert at the end of the linked list node for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { //Insert end p.next = newNode(hash, key, value, null); //When the number of nodes reaches the threshold (8 by default), execute the treeifyBin method // This method will determine whether to convert to red black tree according to HashMap array. // Only when the length of the array is greater than or equal to 64 can the red black tree be converted. Otherwise, it is just an expansion of the array. if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } //When a node with the same key value is found, jump out of the loop and modify the value if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } // Find the node whose key value and hash value are equal to the inserted element if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) //Modify old values e.value = value; afterNodeAccess(e); //Return old value return oldValue; } } //Structural modification ++modCount; // If the actual size is greater than the threshold, the capacity will be expanded if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }
- First judge whether the array table is null or the length is 0. If so, resize() will expand the capacity. Otherwise, determine the stored subscript according to the hash. If the corresponding position of the array is null at this time, the new Node will be put into the position of the subscript of the current array.
- If there is a value in the corresponding position, judge whether the key s are consistent. If they are consistent, directly modify the value value, otherwise traverse and insert at the end.
- Judge whether the length of the linked list is greater than the default threshold (8). If not, the old value will be returned directly. Otherwise, the linked list will be converted into a red black tree.
get function
public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { // The first element of the array position calculated by hash exists if (first.hash == hash && // The hash of the first element in the current position of the array is equivalent to the hash to be obtained and the key is equal ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { // There is a second element if (first instanceof TreeNode) // The first element is the value taking method of a tree node red black tree return ((TreeNode<K,V>)first).getTreeNode(hash, key); do {// Circular traversal of linked list nodes to find key s and hash es with equal nodes if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } // No null returned return null; }
Capacity expansion
When is capacity expansion?
- The array table is null or has a length of 0
- If the actual number of elements in the array is greater than the threshold, the capacity will be expanded
- The length in the linked list exceeds TREEIFY_THRESHOLD (8), but the table length is less than MIN_TREEIFY_CAPACITY(64)
final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; //Old capacity greater than 0 if (oldCap > 0) { //The old capacity is greater than or equal to the maximum capacity. The threshold is equal to the maximum capacity and returns the old capacity value if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } //The new capacity is twice the old capacity, and the old capacity is greater than the default initialization capacity and less than the maximum capacity else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // The new threshold is twice the old threshold } else if (oldThr > 0) //When the old capacity is less than or equal to 0, the new capacity is equal to the old threshold newCap = oldThr; else { // Both threshold and capacity are initialized to default values newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { //Traverse old array oldTab for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { //Empty old array j position oldTab[j] = null; //There is only one value, which is put into the corresponding position of the new array according to the hash if (e.next == null) newTab[e.hash & (newCap - 1)] = e; //Red black tree else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // Linked list optimization of nodes with heavy hash Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; //Original index if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } //Original index + oldCap else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); // Place the original index on the new array j if (loTail != null) { loTail.next = null; newTab[j] = loHead; } // Place the original index + oldCap on the new array j+oldCap if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
Summary
- Capacity expansion is a particularly performance consuming operation, so when using HashMap, estimate the size of the map and give an approximate value during initialization to avoid frequent capacity expansion of the map.
- The load factor can be modified or greater than 1, but it is not recommended to modify it easily.
- HashMap is not thread safe. Do not operate HashMap at the same time in a concurrent environment. Use concurrent HashMap for construction.
- JDK1. Before 8, HashMap was the implementation of array + linked list, jdk1 8. It is implemented in the way of array + linked list / red black tree.
4. Common interview questions
1. Underlying data structure of HashMap
JDK1. Array + linked list before 8
JDK1. After 8, array + linked list / red black tree
2. Working principle of HashMap
The bottom layer of HashMap is array + linked list, which is implemented by the internal class of Node. The data is stored by put method and obtained by get method.
When storing data, pass the K and V key values to the put method
- Call the hash(K) method to calculate the hash value of K, and hash & (n-1) calculate the array subscript that should be placed at this time.
- Judge whether the current array position has a value. If not, insert it directly. If there is a value, judge whether their key s are equal. If equal, modify its value.
- Otherwise, judge whether the current node is a red black tree or a linked list. If it is a linked list, traverse and insert elements at the end. If there is a node with equal key, modify its value. If it is a red black tree, traverse the red black tree and insert elements
- Judge whether the length of the linked list is greater than 8 after inserting the linked list. If it is greater than and the array length is greater than 64, the linked list will be transformed into a red black tree. If the array length is less than 64, the array capacity will be expanded.
- Judge whether the number of actual elements in the HashMap is greater than capacity * loadactor after insertion. If it is greater than, the capacity will be expanded.
When getting data, pass the K value to the get method.
- Call the hash(K) method (calculate the hash value of K) to obtain the array index of the linked list where the key value is located
- Traverse the linked list or red black tree, and use the equals() method to find the V value corresponding to the K value in the linked list of the same Node.
3. Why is the length of the underlying array of HashMap always to the nth power of 2
- The data is evenly distributed to reduce hash collision
- When the length n is always the nth power of 2, hash & (n-1) is equivalent to hash%n-1, that is, it is equivalent to modular operation, and it is much faster in speed and efficiency than direct modular operation
4. Does HashMap allow null keys and null values
HashMap allows one Key to be null and multiple values to be null
5. What will happen to HashMap thread safety
JDK1.7. Cyclic chain or data loss will occur during capacity expansion
JDK1. 8. Data overwrite will occur
Reference link