TopK problem (fast permutation / heap / binary search tree / count sorting) leetcode347

catalogue

TopK problem description

1. Fast exhaust deformation

1.1 graphic process

1.2 realization of fast scheduling

2. Large root pile (front K small) / small root pile (front K large)

2.1 # large root pile and small root pile

2.2 using PriorityQueue to realize large root heap

3. Bucket sorting

3.1 diagrammatic bucket sorting

3.2 realization

TopK problem description

There are four good ways to solve the TopK problem, whether it is to find the first k big / the first k small / the k big / the K small:

1. O(N): use fast displacement to solve TopK problem most efficiently

2. O(NlogK): large root pile (top K small) / small root pile (top K large)

3. O(N): bucket sorting

Take leetcode 347 as an example https://leetcode-cn.com/problems/top-k-frequent-elements/

1. Fast exhaust deformation

Solving the TopK problem O(N) most efficiently with fast platoon deformation

First, arrange the number with the smallest K through fast row segmentation. According to the nature of fast row segmentation, the number of K - 1 on its left is less than or equal to it, so it and the number on its left are the number with the smallest K we are looking for.

1.1 graphic process

Suppose we now sort the 10 numbers of "6 # 1 # 2 # 7 # 9 # 3 # 4 # 5 # 10 # 8".

First, take any number in this sequence as the reference number. For convenience, let the first number 6 be the reference number. Next, you need to put all numbers larger than the benchmark number in the sequence to the right of 6 and those smaller than the benchmark number to the left of 6.

In fact, the method is very simple: start "detection" from both ends of the initial sequence "6 # 1 # 27 # 9 # 3 # 4 # 5 # 10 # 8". First find a number less than 6 from right to left, then find a number greater than 6 from left to right, and then exchange them. Here you can use two variables, I and j, to point to the leftmost and rightmost of the sequence, respectively. Let's give these two variables nice names "sentry I" and "sentry J". At the beginning, let sentry I point to the leftmost side of the sequence (i.e. i=1) and point to the number 6. Let sentinel J point to the far right of the sequence (i.e. j=10) and point to the number 8.

First the sentry j began to move out. Because the benchmark number set here is the leftmost number, it is very important to let the sentry j go first (please think about why). Sentry j moves left step by step (i.e. j --) until he finds a number less than 6 and stops. Next, sentry I moves to the right step by step (i.e. I + +) until he finds a number greater than 6 and stops. Finally, sentry j stopped in front of the number 5 and sentry I stopped in front of the number 7.

Now swap the values of the elements pointed to by sentinel i and sentinel j. The sequence after exchange is as follows.

        6  1  2  5  9 3  4  7  10  8

This is the end of the first exchange. Next, sentry j continues to move to the left (kindly remind that sentry j must start first each time). He found 4 (smaller than the benchmark 6, meeting the requirements) and stopped. Sentry i also continued to move to the right. He found 9 (larger than the benchmark number of 6, meeting the requirements) and stopped. At this time, exchange again, and the sequence after exchange is as follows.

        6  1  2 5  4  3  9  7 10  8

After the second exchange, the "detection" continues. Sentry j continued to move to the left. He found 3 (smaller than the benchmark 6, meeting the requirements) and stopped again. The sentry i keeps moving to the right. No! At this time, sentry i and sentry j meet, and both sentry i and sentry j come to 3. Indicates that the "probe" ends at this time. We'll swap the reference numbers 6 and 3. The sequence after exchange is as follows.

        3  1 2  5  4  6  9 7  10  8

This is the real end of the first round of "detection". The rest can be done by analogy.

1.2 realization of fast scheduling

public static int[] topKFrequent(int[] nums, int k) {
    if(nums.length == 0 || k == 0) return new int[0];
    Map < Integer, Integer > map = new HashMap < > (); // Record the number of occurrences of the element
    for(int i = 0; i < nums.length; i++) map.put(nums[i], map.getOrDefault(nums[i], 0) + 1);
    int[][] array = new int[2][map.size()]; // The first line is the original array, and the second line is the number of occurrences
    int i = 0;
    for(Map.Entry < Integer, Integer > entry: map.entrySet()) {
        array[0][i] = entry.getKey();
        array[1][i++] = entry.getValue();
    }
    return quickSort(array, 0, array[0].length - 1, k - 1);
}

/**
 * Quick sort
 *
 * @param originArray Original array
 * @param start       Start subscript
 * @param end         End subscript
 * @param target      Target subscript
 * @return
 */
private static int[] quickSort(int[][] originArray, int start, int end, int target) {
    // Divide every fast row once. Find the element with subscript j after sorting. If J is exactly equal to k, return J and all numbers to the left of J
    int k = partition(originArray, start, end);
    if(k == target) return Arrays.copyOfRange(originArray[0], 0, k + 1);
    // Otherwise, according to the size relationship between subscripts j and k, it is determined whether to continue to segment the left segment or the right segment.
    return k > target ? quickSort(originArray, start, k - 1, target) : quickSort(originArray, k + 1, end, target);
}

/**
 * Quickly arrange the segmentation and return the subscript j, so that the numbers smaller than nums[j] are on the left of J and the numbers larger than nums[j] are on the right of J.
 *
 * @param originArray
 * @param startIndex
 * @param endIndex
 * @return
 */
private static int partition(int[][] originArray, int startIndex, int endIndex) {
    int freq = originArray[1][startIndex];
    int start = startIndex;
    int end = endIndex;
    // Based on the starting point
    while(start < end) {
        while(start < end && originArray[1][end] <= freq) // Look from back to front until you find a value greater than the benchmark value
            end--;
        while(start < end && originArray[1][start] >= freq) // Look from front to back until you find a value smaller than the benchmark value
            start++;
        if(start < end) swap(originArray, start, end); // change of position
    }
    swap(originArray, start, startIndex); // Exchange position of datum point and coincidence point
    return start;
}

private static void swap(int[][] originArray, int start, int end) {
    for(int i = 0; i < originArray.length; i++) {
        int temp = originArray[i][start];
        originArray[i][start] = originArray[i][end];
        originArray[i][end] = temp;
    }
}

2. Large root pile (front K small) / small root pile (front K large)

2.1 # large root pile and small root pile

Nature: the value of each node is greater than the value of its left child and right child nodes, which is called large root heap; The value of each node is less than the value of its left child and right child nodes, which is called small root heap. As shown below

We marked each number in the above figure, and the above structure is mapped into an array, which becomes the following

There is also a basic concept: find the parent node and left and right child nodes of a number in the array. For example, if the number with index i is known, then

1. Parent node index: (i-1)/2 (here, divide by 2 in the computer and omit the decimal)

2. Left child index: 2*i+1

3. Right child index: 2*i+2

Therefore, the above two arrays can be brain complemented into a heap structure because they meet the defined properties of the heap:

Large root pile: arr (I) > arr (2 * I + 1) & & arr (I) > arr (2 * I + 2)

Small root pile: arr (I) < arr (2 * I + 1) & & arr (I) < arr (2 * I + 2)

2.2 using PriorityQueue to realize large root heap

public static int[] topKFrequent1(int[] nums, int k) {
    // Construct HashMap. Key: each element in nums; Value: the number of occurrences of the corresponding element (i.e. frequency)
    HashMap < Integer, Integer > store = new HashMap < > ();
    for(Integer num: nums) 
        store.put(num, store.getOrDefault(num, 0) + 1);
    // Construct a small root heap (i.e. the smallest heap) to save the first k elements with the highest frequency
    // The purpose of heap is to sort the small numbers according to their occurrence frequency
    PriorityQueue < Integer > minHeap = new PriorityQueue < > (Comparator.comparingInt(store::get));
    for(Integer key: store.keySet()) {
        if(minHeap.size() < k) // Fill minHeap
            minHeap.offer(key);
        else if(store.get(key) > store.get(minHeap.peek())) { // As long as it is full, the minimum value is derived
            minHeap.poll();
            minHeap.offer(key);
        }
    }
    int[] res = new int[k];
    int i = 0;
    while(!minHeap.isEmpty()) 
        res[i++] = minHeap.poll();
    return res;
}

3. Bucket sorting

One sentence summary: divide multiple intervals with the same range, sort each sub interval automatically, and finally merge.

Bucket sorting is an extended version of counting sorting. Counting sorting can be regarded as that each bucket only stores the same elements, while bucket sorting each bucket stores a certain range of elements. Through the mapping function, map the elements in the array to be sorted to each corresponding bucket, sort the elements in each bucket, and finally put the elements in the non empty bucket into the original sequence one by one.

Bucket sorting needs to ensure that the elements are evenly dispersed as far as possible. Otherwise, when all data are concentrated in the same bucket, bucket sorting will fail.

3.1 diagrammatic bucket sorting

3.2 realization

public int[] topKFrequent(int[] nums, int k) {
    // Construct HashMap. Key: each element in nums; Value: the number of occurrences of the corresponding element (i.e. frequency)
    HashMap < Integer, Integer > store = new HashMap < > ();
    for(Integer num: nums) 
  		store.put(num, store.getOrDefault(num, 0) + 1);
    // Construct a set of buckets (i.e. a series of buckets). The number of buckets is the length of nums + 1, because buckets[0] has no meaning
    // The purpose is to put the number with frequency I into the ith bucket (i.e. buckets[i])
    List < Integer > [] buckets = new List[nums.length + 1];
    for(Map.Entry < Integer, Integer > entry: store.entrySet()) {
        if(buckets[entry.getValue()] == null) // If a bucket has not been filled with numbers (i.e. uninitialized), initialize it as an array
            buckets[entry.getValue()] = new ArrayList < > ();
        buckets[entry.getValue()].add(entry.getKey());
    }
    int[] res = new int[k];
    int index = 0;
    for(int i = buckets.length - 1; i > 0; i--) { // Traverse each bucket
        if(buckets[i] != null) { // If there are numbers in the bucket
            for(int j = 0; j < buckets[i].size() && index < k; j++) {
                res[index++] = buckets[i].get(j); // Fill the numbers in the bucket into the res array in turn
            }
        }
    }
    return res;
}

 

Tags: leetcode

Posted by zigizal on Tue, 19 Apr 2022 00:12:42 +0930