HashMap source code analysis and summary of common interviews

This paper analyzes the hash() function and underlying data structure of HashMap, as well as common methods and common interview related questions

1. Introduction to HashMap

HashMap is a common collection class of K and V key value pairs, which implements the Map interface.
jdk1. Before 8, HashMap was implemented in the way of array + linked list, which stores data with conflicting key values.
jdk1.8. It is implemented in the way of array + linked list / red black tree. After the following two conditions are met, the linked list to red black tree operation will be executed to speed up the search speed.

  • The length of the linked list is greater than the threshold (8 by default)
  • HashMap array length exceeds 64

2. underlying data structure of HashMap

Properties of class

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {

    // The default initial capacity is 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
    // Maximum capacity
    static final int MAXIMUM_CAPACITY = 1 << 30;
    // Default fill factor
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
    // When the number of nodes on the bucket is greater than this value, it will turn into a red black tree
    static final int TREEIFY_THRESHOLD = 8;
    // When the number of nodes on the bucket is less than this value, the tree turns to the linked list
    static final int UNTREEIFY_THRESHOLD = 6;
    // The structure in the bucket is transformed into the minimum size of the table corresponding to the red black tree
    static final int MIN_TREEIFY_CAPACITY = 64;
    // The size of an array of storage elements is always a power of 2
    transient Node<k,v>[] table;
    // A set that holds concrete elements
    transient Set<map.entry<k,v>> entrySet;
    // The number of stored elements. Note that this is not equal to the length of the array.
    transient int size;
    // Counter for each expansion and change of map structure
    transient int modCount;
    // Critical value when the actual size (capacity * filling factor) exceeds the critical value, capacity expansion will be carried out
    int threshold;
    // Loading factor
    final float loadFactor;
}

Some inner classes

  • EntrySet, entryitterer, entrysplitterkey Set iterator separator s
  • KeySet, KeyIterator, keysplitterset iterator separator
  • Values, ValueIterator, valuesplitter value Set iterator separator

use

public class TestMap {
    public static void main(String[] args) {
        
        HashMap<String, String> test =  new HashMap<>();
        System.out.println(test.size());
        test.put("A","AAA");
        test.put("B","BBB");
        test.put("C","CCC");
        test.put("D","DDD");
        test.put("E","EEE");

        Iterator<Map.Entry<String, String>> iterator = test.entrySet().iterator();
        while (iterator.hasNext()){
            System.out.println(iterator.next());
        }

        Iterator<String> iterator1 = test.keySet().iterator();
        while (iterator1.hasNext()){
            System.out.println(iterator1.next());
        }

        Spliterator<String> spliterator = test.values().spliterator();

        System.out.println(spliterator.estimateSize());
        spliterator.forEachRemaining(System.out::println);

        System.out.println(test.size());
    }
}

Node class

    static class Node<K, V> implements Entry<K, V> {
        final int hash; //Hash value, which is used to compare with the hash value of other elements when storing elements.
        final K key; // Key: the only key is a value. One key is allowed to be null 
        V value; // value
        HashMap.Node<K, V> next; //Point to the next Node

        Node(int var1, K var2, V var3, HashMap.Node<K, V> var4) {
            this.hash = var1;
            this.key = var2;
            this.value = var3;
            this.next = var4;
        }

        public final K getKey() {
            return this.key;
        }

        public final V getValue() {
            return this.value;
        }

        public final String toString() {
            return this.key + "=" + this.value;
        }
		
        public final int hashCode() {
            return Objects.hashCode(this.key) ^ Objects.hashCode(this.value);
        }

        public final V setValue(V var1) {
            Object var2 = this.value;
            this.value = var1;
            return var2;
        }

        public final boolean equals(Object var1) {
            if (var1 == this) {
                return true;
            } else {
                if (var1 instanceof Entry) {
                    Entry var2 = (Entry)var1;
                    if (Objects.equals(this.key, var2.getKey()) && Objects.equals(this.value, var2.getValue())) {
                        return true;
                    }
                }

                return false;
            }
        }
    }

TreeNode source code

    // Some methods are not posted here
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // Red black tree parent node
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // Unlinking is required after deletion
        boolean red;
        TreeNode(int hash, K key, V val, Node<K,V> next) {
            super(hash, key, val, next);
        }
    }

Array + linked list
Node<k,v>[] table

Array + red black tree

3. Common methods of HashMap

Constructor

    /**
     * Construct an empty HashMap with a specified initial capacity and load factor.
     */
    public HashMap(int initialCapacity, float loadFactor) {
      //If the initialization size is less than 0, an exception is thrown  
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
      //The maximum value of table in HashMap is 2 ^ 30. If the initialization size is greater than 2 ^ 30, it is 2 ^ 30
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
            
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    /**
     * The custom initial capacity uses the default load factor (0.75) to construct an empty HashMap.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Construct an empty HashMap using the default initial capacity (16) and the default loading factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
  • Filling factor: loadFactor indicates the size of the filling factor. Let's briefly introduce the filling factor: assuming that the size of the array is 16, each element in the array is mod 9, and the position of all elements after taking the mold is (0 – 9). At this time, the size of the filling factor is 9 / 16, and the filling factor is 0.75.

  • The HashMap initialization process is to create a new array with the size of capacity and the type of node. Node has introduced this class above, including a pointer, a key, a value, and a hash. Capacity is the power of 2. As for why it is the power of 2, it will be introduced later.

hash function

    static final int hash(Object key) {
        int h;
        //Use the high 16 bits and low 16 bits of the hashCode of the key to perform XOR operation, and put the result in the low 16 bits
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

Because both put and get of hashmap need to sum the key with the array size length -1 after being processed by the hash function, and the size of the array usually does not exceed 2 ^ 16, so the lower 16 bits always participate in the operation. Therefore, XOR the upper 16 bits and lower 16 bits of the hashCode of the key, and the value obtained will have the characteristics of hashing.

put function

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        //The table is uninitialized or the length is 0 for capacity expansion 
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // Determine the stored subscript. If it is null at this time, the new Node will be put into the position of the subscript of the current array.
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
     	//An element already exists at the current position of the array
        else {
            Node<K,V> e; K k;
            //The hash of the current position p of the array is equal to the hash of the element to be inserted and the key is equal
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                //The current position p is assigned to e
                e = p;
            // hash values are different, that is, key s are different, and p is a red black tree node
            else if (p instanceof TreeNode)
            	//Insert red black tree
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            //Is a linked list node
            else {
            	//Insert at the end of the linked list node
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                    	//Insert end
                        p.next = newNode(hash, key, value, null);
                        //When the number of nodes reaches the threshold (8 by default), execute the treeifyBin method
                        // This method will determine whether to convert to red black tree according to HashMap array.
                    	// Only when the length of the array is greater than or equal to 64 can the red black tree be converted. Otherwise, it is just an expansion of the array.
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //When a node with the same key value is found, jump out of the loop and modify the value
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // Find the node whose key value and hash value are equal to the inserted element
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                	//Modify old values
                    e.value = value;
                afterNodeAccess(e);
                //Return old value
                return oldValue;
            }
        }
        //Structural modification
        ++modCount;
        // If the actual size is greater than the threshold, the capacity will be expanded
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

  1. First judge whether the array table is null or the length is 0. If so, resize() will expand the capacity. Otherwise, determine the stored subscript according to the hash. If the corresponding position of the array is null at this time, the new Node will be put into the position of the subscript of the current array.
  2. If there is a value in the corresponding position, judge whether the key s are consistent. If they are consistent, directly modify the value value, otherwise traverse and insert at the end.
  3. Judge whether the length of the linked list is greater than the default threshold (8). If not, the old value will be returned directly. Otherwise, the linked list will be converted into a red black tree.

get function

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) { // The first element of the array position calculated by hash exists
            if (first.hash == hash && // The hash of the first element in the current position of the array is equivalent to the hash to be obtained and the key is equal
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) { // There is a second element
                if (first instanceof TreeNode) // The first element is the value taking method of a tree node red black tree
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {// Circular traversal of linked list nodes to find key s and hash es with equal nodes
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        // No null returned
        return null;
    }

Capacity expansion
When is capacity expansion?

  1. The array table is null or has a length of 0
  2. If the actual number of elements in the array is greater than the threshold, the capacity will be expanded
  3. The length in the linked list exceeds TREEIFY_THRESHOLD (8), but the table length is less than MIN_TREEIFY_CAPACITY(64)
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        //Old capacity greater than 0 
        if (oldCap > 0) {
        	//The old capacity is greater than or equal to the maximum capacity. The threshold is equal to the maximum capacity and returns the old capacity value
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            //The new capacity is twice the old capacity, and the old capacity is greater than the default initialization capacity and less than the maximum capacity
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // The new threshold is twice the old threshold
        }
        else if (oldThr > 0) //When the old capacity is less than or equal to 0, the new capacity is equal to the old threshold
            newCap = oldThr;
        else {               // Both threshold and capacity are initialized to default values
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
        	//Traverse old array oldTab
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                	//Empty old array j position
                    oldTab[j] = null;
                    //There is only one value, which is put into the corresponding position of the new array according to the hash
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    //Red black tree 
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // Linked list optimization of nodes with heavy hash
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            //Original index
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            //Original index + oldCap
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                         // Place the original index on the new array j
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // Place the original index + oldCap on the new array j+oldCap
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

Summary

  1. Capacity expansion is a particularly performance consuming operation, so when using HashMap, estimate the size of the map and give an approximate value during initialization to avoid frequent capacity expansion of the map.
  2. The load factor can be modified or greater than 1, but it is not recommended to modify it easily.
  3. HashMap is not thread safe. Do not operate HashMap at the same time in a concurrent environment. Use concurrent HashMap for construction.
  4. JDK1. Before 8, HashMap was the implementation of array + linked list, jdk1 8. It is implemented in the way of array + linked list / red black tree.

4. Common interview questions

1. Underlying data structure of HashMap
JDK1. Array + linked list before 8
JDK1. After 8, array + linked list / red black tree

2. Working principle of HashMap
The bottom layer of HashMap is array + linked list, which is implemented by the internal class of Node. The data is stored by put method and obtained by get method.

When storing data, pass the K and V key values to the put method

  1. Call the hash(K) method to calculate the hash value of K, and hash & (n-1) calculate the array subscript that should be placed at this time.
  2. Judge whether the current array position has a value. If not, insert it directly. If there is a value, judge whether their key s are equal. If equal, modify its value.
  3. Otherwise, judge whether the current node is a red black tree or a linked list. If it is a linked list, traverse and insert elements at the end. If there is a node with equal key, modify its value. If it is a red black tree, traverse the red black tree and insert elements
  4. Judge whether the length of the linked list is greater than 8 after inserting the linked list. If it is greater than and the array length is greater than 64, the linked list will be transformed into a red black tree. If the array length is less than 64, the array capacity will be expanded.
  5. Judge whether the number of actual elements in the HashMap is greater than capacity * loadactor after insertion. If it is greater than, the capacity will be expanded.

When getting data, pass the K value to the get method.

  1. Call the hash(K) method (calculate the hash value of K) to obtain the array index of the linked list where the key value is located
  2. Traverse the linked list or red black tree, and use the equals() method to find the V value corresponding to the K value in the linked list of the same Node.

3. Why is the length of the underlying array of HashMap always to the nth power of 2

  1. The data is evenly distributed to reduce hash collision
  2. When the length n is always the nth power of 2, hash & (n-1) is equivalent to hash%n-1, that is, it is equivalent to modular operation, and it is much faster in speed and efficiency than direct modular operation

4. Does HashMap allow null keys and null values
HashMap allows one Key to be null and multiple values to be null

5. What will happen to HashMap thread safety
JDK1.7. Cyclic chain or data loss will occur during capacity expansion
JDK1. 8. Data overwrite will occur

Reference link

  1. Meituan technology team's re understanding of Java 8 series HashMap
  2. JavaGuide HashMap(JDK1.8) source code + underlying data structure analysis
  3. HashMap common interview questions

Tags: Java

Posted by klik on Wed, 13 Apr 2022 23:40:08 +0930