Three non comparative sorting algorithms

Count sort

thinking

Very simple, it is to set up a hash table to count the number of elements in the array, calculate the prefix sum after the statistics, convert the statistical number of elements into the subscript position that should be placed, and finally traverse the elements of the original array and put the elements in the mapping position of the hash table. (note that one element is placed at a time, and the corresponding subscript position count is reduced by 1)

realization

class Solution {
public:
    // Data range - 50000 < = a [i] < = 50000, so the array should hold 100000 numbers
    int SIZE = 100001;
    // The purpose of setting the offset is to make all values positive and prevent array overflow
    int OFFSET = 50000;
    vector<int> countingSort(vector<int>& nums) {
        vector<int> count(SIZE,0);
        int n = nums.size();
        for(int i = 0; i < n; i++){
            count[nums[i] + OFFSET]++;
        }

        for(int i = 1; i < SIZE; i++){
            count[i] += count[i - 1];
        }

        vector<int> temp(nums.begin(),nums.end());
        for(int i = n - 1; i >= 0; i--){
            int idx = count[temp[i] + OFFSET] - 1;
            count[temp[i] + OFFSET]--;
            printf("idx=%d nums[%d]=%d\n",idx,i,temp[i]);
            nums[idx] = temp[i];
        }
        return nums;
    }
};

analysis

Time complexity: O ( N + k ) O(N+k) O(N+k), where N is the length of the array and K is the length of the hash table

Space complexity: O ( N + k ) O(N+k) O(N+k) requires an additional hash table and an additional array

Stability: stability, if the last time you traverse the array is backward and forward.

Finally, when traversing the array, you must traverse from the back to the front, because if the same value appears in the array many times, these elements are counted in the order from the front to the back. After prefix and conversion, the initial subscript position of the value corresponds to the same element that appears last. If you schedule an element subscript, subtract 1 to point to the previous same element of the value in the original array, Therefore, we should also make adjustments according to the counting order, so that the last element of the same value enters the last position of the scheduled array. If it is traversed from front to back, it is to put the last element into the first position of the scheduled array first, which will lose stability.

Initial order:

The number of moves has nothing to do with: like merging, the scheduling comes from additional arrays. If there is any disorder, it should be moved at one time.

Comparison times independent: non comparison algorithm

Regardless of time complexity: the best and worst are O(N+k)

The number of sorting times is irrelevant: O(N) algorithm. If there is any disorder, it is a trip

Cardinality sort

(don't confuse it with counting. Counting Sort is Counting Sort, and the radix is Radix Sort)

The disadvantage of counting sorting is that the size of the counter depends heavily on the original sequence. When the element value is very large, it will cause great space loss (when the number is small and the value is large, it is a great waste of space). The improvement method is to sort from bottom to top according to the number of elements, and the size of the counter is fixed in the range of 0-9. Obviously, it will pay the price of time for space.

Generally speaking, when the value range of the sequence to be sorted is small and the quantity is large, count sorting is used, while when the value range of the sequence to be sorted is large and the quantity is small, cardinal sorting is used.

class Solution {
public:
    int OFFSET = 50000; // -50000 <= A[i] <= 50000
    vector<int> RadixSort(vector<int>& nums) {  
        int n = nums.size(),maxlen = 0;
        for(int i = 0; i < n; i++){
            nums[i] += OFFSET;  // Add the offset to prevent the influence of negative numbers
            maxlen = std::max(maxlen,nums[i]);  // Take the number of digits of the largest element as the cyclic quantity
        }

        vector<int> cnt(10,0),tmp(nums.begin(),nums.end());
        int div = 1;
        while(div <= maxlen){
            for(int i = 0; i < n; i++){
                int bit = (nums[i] / div) % 10; //From the single bit of the element to the top bit
                cnt[bit]++;
            }
            for(int i = 1; i < 10; i++){
                cnt[i] += cnt[i - 1];
            }
            for(int i = n - 1; i >= 0; i--){
                int bit = (tmp[i] / div) % 10;
                int idx = cnt[bit] - 1;
                cnt[bit]--;
                nums[idx] = tmp[i]; 
            }
            // Clear the counter, update tmp and sort the next digit
            cnt = vector<int>(10,0);
            tmp = vector<int>(nums.begin(),nums.end());
            div *= 10;
        }

        for(int i = 0; i < n; i++){
            nums[i] -= OFFSET;	//Don't forget to restore the original value after sorting
        }
        return nums;
    }
    int getElemLen(int digit){
        int cnt = 0;
        while(digit){
            digit /= 10;
            cnt++;
        }
        return cnt;
    }

};

analysis

Time complexity: O ( N ∗ k ) O(N*k) O(N * k), where N is the length of the array and K is the number of digits of the maximum value in the sequence

Space complexity: O ( N ) O(N) O(N), the counter length is fixed to 10, which can be ignored. It mainly depends on the length of the additional array

Stability: stable, the reason is the same as the counting sort

Initial sequence:

It has nothing to do with the number of moves: it is the same as counting and sorting. The sorting depends on an additional array, and there is no need to move

The number of comparisons is independent: it is the same as counting sorting. The sorting depends on an additional array, and there is no need to compare

Time complexity is irrelevant: the best and worst cases are O ( N ∗ k ) O(N*k) O(N∗k)

The number of sorting times is independent: the number of outer loops depends on the number of bits of the largest element rather than the initial sequence

Bucket sorting

Divide the original sequence into "buckets" with multiple equal length intervals according to the value range, classify and sort the elements of the sequence into the corresponding buckets, and "rewind" the elements in each bucket one by one after the bucket loading:

class Solution {
public:
    int OFFSET = 50000; // -50000 <= A[i] <= 50000
    vector<int> sortArray(vector<int>& nums) {  
        buckSort(nums);
        return nums;
    }
    void buckSort(vector<int> &nums){
        int n = nums.size(),maxi = 0,mini = 0;
        for(int i = 0; i < n; i++){
            nums[i] += OFFSET;  // Add the offset to prevent the influence of negative numbers
            maxi = std::max(maxi,nums[i]);
            mini = std::min(mini,nums[i]);
        }

        int gap = 2; // Set the interval length of each barrel, and set the interval as left closed and right open
        int buckAmount = (maxi - mini) / gap + 1;    //Number of barrels
        vector<vector<int>> buckets(buckAmount); 
        //When the elements are put into the bucket, they need to be sorted every time the bucket is updated. Here, insert sorting is used
        for(int i = 0; i < n; i++){
            int idx = (nums[i] - mini)  / gap;
            buckets[idx].push_back(nums[i]);
            insertionSort(buckets[idx]);
        }

        int k = 0;
        //Take the sorted elements out of the bucket one by one
        for(int i = 0; i < buckAmount; i++){
            int buckLoad = buckets[i].size();
            for(int j = 0; j < buckLoad; j++){
                nums[k++] = buckets[i][j];
            }
        }

        //Restore the original value of the array
        for(int i = 0; i < n; i++){
            nums[i] -= OFFSET;
        }
    }
    void insertionSort(vector<int> &nums){
        int n = nums.size();
        for(int i = 1; i < n; i++){
            int tmp = nums[i];
            int j = i;
            while(j > 0 && nums[j - 1] > tmp){
                nums[j] = nums[j - 1];
                j--;
            }
            nums[j] = tmp;
        }
    }


};

analysis

Time complexity: depends on the number k of buckets and the sorting algorithm used. Let the group leader be n. in the best case, each element is evenly divided into each bucket and orderly, and the time complexity is up to O ( N ) O(N) O(N), in the worst case, all elements are packed in one bucket, and the time complexity is O ( N 2 ) O(N^2) O(N2)

The sorting algorithm used in this example is insert sorting. If other sorting algorithms are used, they need to be discussed separately, for example:

Sorting algorithmFinal complexity
insertBest $o (n / k) $- > O ( N ) O(N) O(N), worst case o ((n / k) ^ 2) $- >$ O ( N 2 ) O(N^2) O(N2)
Quick rowbest O ( ( n / k ) ∗ l o g ( n / k ) ) O( (n/k)*log(n/k) ) O((n/k)∗log(n/k)) -> O ( N l o g N ) O( NlogN) O(NlogN), worst case o ((n / k) ^ 2) - > O ( N 2 ) O(N^2) O(N2)

However, when the number of barrels is appropriate, O(NlogN) is approximately equal to O(N).

Space complexity: depending on the implementation method of bucket array, this example uses vector to dynamically insert the array, which is similar to linked list. The space complexity is O(n+k), that is, array length + bucket number. If the array is fixed allocation, the space complexity is O(nk)

Stability: the logic of bucket loading is similar to counting sorting, so the process is stable. Don't use unstable sorting algorithm to sort buckets.

Initial order:

Move times: irrelevant. The sorting comes from the bucket array. It is scheduled once and does not need to be moved

Comparison times: irrelevant, non comparison algorithm

Time complexity: depends on the sorting algorithm of the sorting bucket

Sorting times: depends on the sorting algorithm of sorting bucket

Tags: C++

Posted by nloding on Sun, 17 Apr 2022 20:27:21 +0930