# 0. First words

Sorting algorithm is one of the most basic algorithms in data structure and algorithm.

Sorting algorithms can be divided into internal sorting and external sorting. Internal sorting is to sort data records in memory, while external sorting is to access external memory in the sorting process because the sorted data is large and can not accommodate all sorting records at one time. Common internal sorting algorithms include: insert sort, Hill sort, select sort, bubble sort, merge sort, quick sort, heap sort, cardinal sort, etc. Summarize with a picture:

## 0.1 explanation of terms

• n: Data scale
• k: Number of "barrels"
• In place: occupy constant memory and no additional memory
• Out place: use extra memory
• Stability: the order of the two equal key values after sorting is the same as that before sorting

# 1. Insert sort

## 1.1 direct insertion sort

Features of direct insertion sorting:

• Space time efficiency: the time complexity is O(n^2) and the space complexity is O(1). In the best case, the elements are basically ordered. At this time, each element inserted only needs to be compared several times without moving, and the time complexity is O(n)

• Stability: ensure that the insertion relative position of equal elements will not change and sort stably

```void insertion_sort_int(int *arr,int n) {
for(int i = 1;i < n;i++) {
int key = arr[i];
int j = i - 1;
for(; j >= 0 && arr[j] > key;j--) {
arr[j + 1] = arr[j];
}
arr[j + 1] = key;
}
}```

## 1.2 sort by half insertion

When finding the insertion position, change to half search, that is, half insertion sorting.

```void binary_insertion_sort_int(int *arr,int n) {
for(int i = 1;i < n;i++) {
int key = arr[i];
int left = 0;
int right = i - 1;
while(left <= right) {
int mid = (left + right) / 2;
if(key < arr[mid])
right = mid - 1;
else
left = mid + 1;
}
for(int j = i - 1; j >= left; j--)
arr[j + 1] = arr[j];
arr[left] = key;
}
}```

## 1.3 Hill sorting

First, take half of the length as the incremental step d1, divide all the records in the table into d1 groups, put the records with step interval d1 in the same group, and then insert and sort directly; Then take half of d1 as the step size and repeat the insertion sorting operation.

• Generally, when n is in a specific range by default, the time complexity of hill sorting is O(n^1.3)

• Stability: when the same keyword is mapped to different sub tables, the relative order may be changed and the sorting is unstable. For example, (3,2,2) will change the relative order of 2

```void shell_sort_int(int *arr,int n) {
for(int dk = n / 2; dk >= 1; dk >>= 1) {
for(int i = dk; i < n;i++) {
int key = arr[i];
int j = i - dk;
for(; j >=0 && arr[j] > key; j -= dk) {
arr[j + dk] = arr[j];
}
arr[j + dk] = key;
}
}
}```

# 2. Merge and sort

## 2.1 two way merging and sorting

Two way merge sort is the application of divide and conquer method. The mode is as follows:

• Decomposition: n elements are decomposed into two n/2 subsequences
• Solution: use merge sort to sort two subsequences recursively
• Merge: merge two ordered subsequences to get the sorting result

For the merging between two ordered lists, you only need to copy them to the auxiliary array, compare them according to the size relationship, and then put them into the original array. This merging method has many variants. The characteristics of two-way merging and sorting are as follows:

• Time efficiency: there are lgn merging times. If the merging time is O(n), the time complexity is O(nlgn)

• Space efficiency: the merge operation requires auxiliary space O(n). It is recommended to allocate a large array outside the merge operation (note that there is also a merge algorithm of O(1))

• Stability: it refers to stable sorting and does not change the relative order of records with the same keyword

```void merge(int *arr,int *help,int left,int mid,int right) {
for(int i = left; i <= right;i++)
help[i] = arr[i]; //hold A Copy elements from to B in,With the help of B handle
int i = left;
int j = mid + 1;
int k = left;
while(i <= mid && j <= right) { //How many records are in common
if(help[i] < help[j])
arr[k++] = help[i++]; //Copy smaller values to A in
else
arr[k++] = help[j++];
}
while(i <= mid)   arr[k++] = help[i++]; //Copy a table directly before it is detected
while(j <= right) arr[k++] = help[j++];
}

void merge_sort_int(int *arr,int *help,int left,int right) {
if(left < right) {
int mid = ((right - left) >> 1) + left;
merge_sort_int(arr,help,left,mid);
merge_sort_int(arr,help,mid + 1,right);
merge(arr,help,left,mid,right);
}
}```

## 2.2 merge and sort in place

In situ merge sorting can merge without auxiliary array. The key is the merge function. The idea is:

• Traverse I, find the first arr [i] > arr [j] (start = j), and determine the position of I. That is, the elements before this are smaller than those starting from j
• Traverse j, find the first element arr [j] > arr [i] (end = j), and determine the position of j, it means that the elements before this are smaller than those starting from I

The above operation shows that the elements of the second sequence (start,end-1) should be before i - move to the front of i by cyclic shift.
For example, 0 1 5 6 9 | 2 3 4 7 8:

• The first traversal determines 5 and 7, and then shifts the 2,3,4 cycle to the front of 5,6,9,
• i starts from the new position of 5 as the first sequence, and j starts from the second sequence
```/*Three auxiliary functions*/
void swap(int *a,int *b) {
int tmp = *a;
*a = *b;
*b = tmp;
}

/*Flip array with length n*/
void reverse(int *arr,int n) {
int i = 0;
int j = n - 1;
while(i < j) {
swap(&arr[i],&arr[j]);
i++;
j--;
}
}

/**
* Shift the array containing n elements left by i bits
*/
void rotation_left(int *arr,int n,int i) {
reverse(arr,i);
reverse(arr + i,n - i);
reverse(arr,n);
}

void merge_inplace(int  *arr,int left,int mid,int right) {
int i = left,j = mid + 1, k = right;
while(i < j && j <= k) {
while(i < j && arr[i] <= arr[j]) i++;
int step = 0;
while(j <= k && arr[j] <= arr[i]) {
j++;
step++;
}
//arr+i Is a subarray, j-i Represents the number of subarray elements, j-i-step Represents the number of bits of the left cyclic shift(Move the front element to the back)
rotation_left(arr + i,j- i,j - i - step);
}
}

void merge_sort_inplace(int *arr,int left,int right) {
if(left < right) {
int mid = (left + right) / 2;
merge_sort_inplace(arr,left,mid);
merge_sort_inplace(arr,mid+1,right);
merge_inplace(arr,left,mid,right);
}
}```

# 3. Exchange sorting

## 3.1 bubble sorting

Bubble sorting requires a total of n-1 iterations, each time determining the final position of an element (the highest value in the remaining sequence). In a sequence, forward traversal can determine a maximum value, reverse traversal can determine a minimum value, and each sequence can be reduced by one element. Its features are as follows:

• Space time efficiency: the average time complexity is O(n^2) and the space complexity is O(1). The ordered subsequence must be globally ordered

• Stability: a stable sorting method

```void bubble_sort_int(int *arr,int n) {
for(int i = 0; i < n - 1; i++) { //n-1 Bubbling,Determine one element at a time and small value elements at a time
bool flag = false;
for(int j = n- 1; j > i; j--) {
if(arr[j - 1] > arr[j]) { //In reverse order
swap(&arr[j - 1],&arr[j]);
flag = true; //Exchange occurs
}
}
if(flag == false)
return ; //Indicates that there is no exchange in this traversal, and the table is already in order
}
}```

## 3.2 quick sort

The core of quick sort is partition operation, and its performance also depends on the quality of partition operation. Quick sort has the best average performance among all internal sorting algorithms. Its features are as follows:

• Time efficiency: the running time of fast scheduling is related to whether the partition is symmetrical. The average time complexity is O(nlgn), and the worst is O(n^2)

• In order to improve the efficiency of the algorithm: when the elements are basically in order, direct insertion sorting is adopted (STL does this, and a threshold is given)

• In order to ensure the balance of division, for example, take the median of three numbers or randomly select pivot elements or even Tukey ninther selection strategy, so that the worst case is unlikely to occur in the actual situation

• Space efficiency: recursive work stack is adopted, and the space complexity is O(lgn) at the best and O(n) at the worst

• Stability: in the partition algorithm, if there are two identical elements on the right, the relative position will change when it is less than pivot, indicating that fast sorting is an unstable sorting, but each sorting can determine the position of an element (pivot). Note that some partition algorithms can ensure local stability

The main framework of quick sort is as follows:

```void quick_sort_int(int *arr,int left,int right) {
if(left < right) {
int pos = partition1(arr,left,right); //divide
quick_sort_int(arr,left,pos - 1);
quick_sort_int(arr,pos + 1,right);
}
}```

### 3.2.1 head and tail pointer to middle scanning method

The two pointers scan from head to tail to the middle respectively. The main consideration here is the strategy of selecting pivot, which may be the first element or the tail element, or even the middle of three numbers or a number at random. The idea is to divide the data by pivot, move the smaller pivot on the right to the left, and move the larger pivot on the left to the right

This method can be used to determine the k-th largest or k-th smallest element.

```int partition(int *arr,int left,int right) {
int pivot = arr[left];
while(left < right) {
while(left < right && pivot <= arr[right])
right--;
arr[left] = arr[right]; //than pivot Move the small value to the left
while(left < right && pivot >= arr[left])
left++;
arr[right] = arr[left]; //than pivot Move the high value to the right end
}
arr[left] = pivot; //left == right
return left;
}```

In fact, there is another exchange strategy: exchange the elements that do not meet the conditions at the left and right ends, and the elements that end the loop should be exchanged with the starting elements. This method combines two strategies: two-way traversal and exchange operation.

```int partition1(int *arr,int left,int right) {
int start = left;
int pivot = arr[left];
while(left < right) {
while(left < right && pivot <= arr[right])
right--;
while(left < right && pivot >= arr[left])
left++;
swap(&arr[left],&arr[right]);
}
swap(&arr[start],&arr[left]);
return left;
}```

### 3.2.2 forward and backward pointer scanning method

Using one-way traversal: two pointer indexes scan backward step by step, one before and one after. Its strategy is to record a last pointer to save elements less than or equal to pivot. Scanning from left to right, every element less than or equal to pivot is stored in the last pointer, so the elements before the last after one division (including last, which depends on the initialization of last) are less than or equal to pivot.

We can see the characteristics of the partition algorithm: traversing from front to back, last records the elements less than or equal to pivot, and the relative order of these elements remains unchanged. If you want the relative order of elements greater than or equal to pivot to remain unchanged, you can traverse from back to front, and pivot takes the first element.

```int partition2(int *arr,int left,int right) {
int pivot = arr[right];
int last = left -1;
for(int i = left; i < right; i++) {
if(arr[i] <= pivot) {
++last;
swap(&arr[last],&arr[i]);
}
}
swap(&arr[last + 1],&arr[right]);
return last + 1;
}```

If the pivot is not easy to get, for example, the Dutch three color flag problem proposed by Dijkstra should ensure the order of 0, 1 and 2. Using simple one-way traversal, there will be the disorder of 0 and 1, and there are many repetitions of elements. Therefore, Dijkstra proposed a simple and fast three-way partition method.

### 3.2.3 three dimensional division method

Dijkstra three-way fast segmentation: it is used to deal with the situation with a large number of repeated elements and reduce the number of comparison of repeated elements during recursion. That is, traverse the array once and maintain three pointers lt, cur and gt

• lt makes the element of arr[0..lt-1] less than v
• gt makes the element of arr[gt+1..n-1] greater than v
• cur makes the element of arr[lt..i-1] equal to V, and the element of arr [I.. GT] is still uncertain

This is a similar problem to the Dutch tricolor flag, and the idea is relatively simple. To learn to use a working pointer combined with two boundary pointers for a linear traversal, there are also many variants. The complete three-way quick sorting code is as follows:

```void quick_sort_threeway(int *arr,int left,int right) {
if(left < right) {
int lt = left;
int cur = left;
int gt = right;
int pivot = arr[left];
while(cur <= gt) {
if(arr[cur] == pivot) {
cur++;
} else if(arr[cur] < pivot) {
swap(&arr[cur],&arr[lt]);
lt++;
cur++;
} else {
swap(&arr[cur],&arr[gt]);
gt--;
}
} //cur > gt Then exit and directly form left..lt-1 lt..gt gt+1..right Three intervals
quick_sort_threeway(arr,left,lt - 1);
quick_sort_threeway(arr,gt + 1,right);
}

}```

However, the only disadvantage of this method is that it uses many more exchanges than the standard dichotomy when there are few repeated elements in the array. The sorting method of Bently et al. Solved this problem faster than that of Bently et al. In the 1990s.

### 3.2.4 fast three-way division method

Bently implements an algorithm with the optimal amount of information by placing repeated elements at both ends of the subarray. Here, the additional exchange is only used for elements equal to the segmented pivot element, and the above additional exchange is used for segmenting unequal elements.

The idea is as follows:

• Maintain two indexes p and q so that the elements of arr[lo..p-1] and arr[q+1..hi] are equal to a[lo] (recorded as v)
• Use two indexes i and j so that a[p..i-1] is less than V and a [J + 1.. q] is greater than v.
• If an element equal to v is encountered in the front and exchanged with p, the latter is exchanged with q (similar operation)

As shown in the figure:

Its code is as follows:

```void quick_sort_threeway_fast(int *arr,int left,int right) {
if(left < right) {
int p = left , q = right + 1;
int pivot = arr[left];
int i = left , j = right + 1;
while(true) {
while(arr[++i] < pivot)
if(i == right)
break;
while(arr[--j] > pivot)
if(j == left)
break;

if(i == j && arr[i] == pivot)
swap(&arr[++p],&arr[i]);
if(i >= j) break;

swap(&arr[i],&arr[j]);
if(arr[i] == pivot)
swap(&arr[++p],&arr[i]);
if(arr[j] == pivot)
swap(&arr[--q],&arr[j]);
}
i = j + 1;
for(int k = left; k <= p; k++) swap(&arr[k],&arr[j--]);
for(int k = right; k >= q;k--) swap(&arr[k],&arr[i++]);
quick_sort_threeway_fast(arr,left,j);
quick_sort_threeway_fast(arr,i,right);
}
}```

## 3.3 optimized sorting algorithm

When the number of elements is relatively small, direct insertion sorting is used. The selection strategy of pivot elements is mainly suitable for those with a large number of elements. The code is as follows:

```/**
* Three data fetching strategy: return the subscript of the array
*/
int median3(int *arr,int i,int j,int k) {
if(arr[i] < arr[j]) {
if(arr[j] < arr[k])
return j;
else {
return (arr[i] < arr[k])? k : i;
}
} else {
if(arr[k] < arr[j])
return j;
else {
return (arr[k] < arr[i])? k : i;
}
}
}

void optimal_sort_int(int *arr,int left,int right) {
int n = right - left + 1;
if(n <= THRESHOLD) {
insertion_sort_int(arr,n);
} else if(n <= 500) { //Sort with three numbers
int pos = median3(arr,left, left + n / 2,right);
swap(&arr[pos],&arr[left]);
} else {//take Tukey ninther As pivot element
int eighth = n / 8;
int mid = left + n / 2;
int m1 = median3(arr,left,left + eighth,left + eighth * 2);
int m2 = median3(arr,mid - eighth,mid,mid + eighth);
int m3 = median3(arr,right - eighth * 2,right - eighth,right);
int ninther = median3(arr,m1,m2,m3);
swap(&arr[ninther],&arr[left]);
}
quick_sort_threeway_fast(arr,left,right);
}```

# 4. Select Sorting

## 4.1 simple selection sorting

The basic idea of selecting sorting: determine a most valuable element in the following elements for each trip, and repeat n - 1 times. Its features are as follows:

• Time efficiency: note that the comparison times of elements are independent of the initial sequence. Always compare n(n-1)/2, and the time complexity is O(n^2)
• Space efficiency is O(1)
• Stability: unstable sorting, such as 6 8 6 5
```void selection_sort_int(int *arr,int n) {
int min;
for(int i = 0; i < n - 1;i++) {//n-1 Select a minimum value each time
min = i;
for(int j = i + 1; j < n;j++) {
if(arr[min] < arr[j]) {
min = j; //Update minimum element position
}
}
if(min != i) swap(&arr[i],&arr[min]); //The minimum value updated to is the same as i Location exchange
}
}```

## 4.2 heap sorting and priority queue

Heap sort is a tree selection sort, which uses the relationship between parent nodes and child nodes in a complete binary tree to select the most valuable element. The sorting steps are as follows:

• Construct a large root heap first (this is also an operation of repeatedly adjusting downward to meet the maximum heap nature)
• Then exchange the elements at the top of the heap with the elements at the bottom of the heap. At this time, the root node no longer meets the nature of the heap. Adjust the elements at the top of the heap downward and repeat the operation until there is one element left in the heap. n-1 times of exchange and adjustment are required

Heap sorting is characterized by:

• Space time efficiency: in the best, average and worst cases, the time complexity of heap sorting is O(nlgn); Space complexity: O(1)

• Stability: unstable sorting

The main program of heap sorting is as follows (note that No. 0 does not store elements, the actual storage starts from 1, and the array length is len + 1):

```void heap_sort_int(int *arr,int len) {
build_max_heap(arr,len);
for(int i = len; i > 1;i--) {
swap(&arr[i],&arr[1]); //And heap top element exchange,And output heap top elements
sink_down(arr,1,i - 1); //Note that there is still room left i-1 Element adjustment heap
}
}```

Top down operation of A heap with subscript k

This operation is to keep the subtree of the element whose root is subscript k satisfying the maximum heap property. If a node is smaller than one of their two child nodes, the subtree rooted on that node does not satisfy the maximum heap property.

Adjustment idea: sink smaller elements from top to bottom and maintain the heap state of its child nodes in a similar way until a child node element meets the maximum heap state. The downward adjustment time is related to the tree height, which is O(h)

This operation is used for heap construction, heap sorting and deletion (delMax). The code is as follows:

```void sink_down(int *arr,int k,int len) {
while(2 * k <= len) {
int j = 2 * k;
if(j <len && arr[j] < arr[j + 1]) j++; //Take a larger element
if(arr[k] > arr[j]) break; //If the root element is larger than the child node, the maximum heap property is met and no adjustment is required
swap(&arr[k],&arr[j]);
k = j;
}
}```

The heap with B subscript k operates from bottom to top

This operation is because the size of the node is larger than its parent node, so it needs to be adjusted upward. Note that the termination condition is k = 1 and the parent node is larger.

This operation is used to insert a new element at the bottom of the heap

```void swim_up(int *arr,int k) {
while(k > 1 && arr[k] > arr[k / 2]) {
swap(&arr[k],&arr[k / 2]);
k = k / 2;
}
}```

To sum up, it is relatively simple to insert, delete and get the maximum value of heap and priority queue. The key lies in these two heap adjustment operations. The amount of code is small, but its operation logic needs to be understood.

### 4.2.2 construction reactor operation

A maximum heap is constructed from the bottom up with an unordered array. The time complexity is O(n). The actual storage starts with subscript 1 and the length of the array is len + 1. The top-down adjustment operation starts from n = len/2

```void build_max_heap(int *arr,int len) {
for(int i = len / 2; i >= 1;i--)
sink_down(arr,i,len);
}```

Large root heap can generally be used to find the minimum number of K in massive data: that is, first read k elements to construct the maximum heap, and then read in the data in turn. If the current data is smaller than the heap top, replace the heap top; If the current data is relatively large, it cannot be the minimum K number. Correspondingly, the small root heap is generally used to find the maximum number of K.

The following is to obtain the minimum number of k by using the fast scheduling partition algorithm and the nature of the heap

```void print(int *arr,int left,int right) {
for(int i = left; i<= right;i++)
printf("%d ",arr[i]);
printf("\n");
}

/*Take the minimum number of k based on partition algorithm*/
void getleastk(int *arr,int n,int k) {
int left = 0;
int right = n - 1;
int pos = partition(arr,left,right);
while(pos != k - 1) {
if(pos > k - 1) {
right = pos - 1;
pos = partition(arr,left,right);
} else {
left = pos + 1;
pos = partition(arr,left,right);
}
}
print(arr,0,k-1);
}

/*Finding the minimum number of k based on the maximum heap*/
void getmink(int *arr,int n,int k) {
int b[k+1];
for(int i = 1; i <= k;i++) {
b[i] = arr[i - 1];
}

build_max_heap(b,k);
for(int i = k; i < n;i++) {
if(arr[i] > b[1])
continue;
else {
b[1] = arr[i];
sink_down(b,1,k);
}
}
heap_sort_int(b,k);
print(b,1,k);
}```

# 5. Linear time sequencing

## 5.1 counting and sorting

The core of counting and sorting is to convert the input data values into keys and store them in the additional array space. As a sort with linear time complexity, count sort requires that the input data must be integers with a certain range.

When the input element is n integers between 0 and K, its running time is Θ (n + k). Counting sorting is not a comparative sorting, and the sorting speed is faster than any comparative sorting algorithm.

Since the length of array C used for counting depends on the range of data in the array to be sorted (equal to the difference between the maximum and minimum values of the array to be sorted plus 1), counting sorting requires a lot of time and memory for arrays with large data range. For example, counting sorting is the best algorithm for sorting numbers between 0 and 100, but it is not suitable for sorting names alphabetically. However, counting sorting can be used in the algorithm of cardinality sorting to sort arrays with a large range of data.

Generally speaking, for example, there are 10 people of different ages. According to the statistics, 8 people are younger than A, and A's age ranks ninth. Using this method, we can get the position of everyone else, which is in good order. Of course, special treatment is required when there are repetitions of age (to ensure stability), which is why the target array should be backfilled and the statistics of each number should be subtracted by 1.

The steps of the algorithm are as follows:

• (1) Find the largest and smallest elements in the array to be sorted
• (2) Count the number of occurrences of each element with value i in the array and store it in the ith item of array C
• (3) All counts are accumulated (starting from the first element in C, and each item is added to the previous item)
• (4) Reverse fill the target array: put each element i in item C(i) of the new array, and subtract 1 from C(i) for each element
```public class CountingSort implements IArraySort {

@Override
public int[] sort(int[] sourceArray) throws Exception {
// yes arr Copy without changing the parameters
int[] arr = Arrays.copyOf(sourceArray, sourceArray.length);

int maxValue = getMaxValue(arr);

return countingSort(arr, maxValue);
}

private int[] countingSort(int[] arr, int maxValue) {
int bucketLen = maxValue + 1;
int[] bucket = new int[bucketLen];

for (int value : arr) {
bucket[value]++;
}

int sortedIndex = 0;
for (int j = 0; j < bucketLen; j++) {
while (bucket[j] > 0) {
arr[sortedIndex++] = j;
bucket[j]--;
}
}
return arr;
}

private int getMaxValue(int[] arr) {
int maxValue = arr[0];
for (int value : arr) {
if (maxValue < value) {
maxValue = value;
}
}
return maxValue;
}

}```

## 5.2 cardinality sorting

Radix sorting is a non comparative integer sorting algorithm. Its principle is to cut the integer into different numbers according to the number of bits, and then compare them according to each number of bits. Since integers can also express strings (such as names or dates) and floating-point numbers in a specific format, cardinality sorting is not limited to integers.

There are two methods of cardinality sorting:

These three sorting algorithms all use the concept of bucket, but there are obvious differences in the use of bucket:

1. Cardinality sorting: allocate buckets according to each number of key value;
2. Counting and sorting: only one key value is stored in each bucket;
3. Bucket sorting: each bucket stores a certain range of values;
```/**
* Cardinality sort
* When considering negative numbers, you can also refer to: https://code.i-harness.com/zh-CN/q/e98fa9
*/
public class RadixSort implements IArraySort {

@Override
public int[] sort(int[] sourceArray) throws Exception {
// yes arr Copy without changing the parameter content
int[] arr = Arrays.copyOf(sourceArray, sourceArray.length);

int maxDigit = getMaxDigit(arr);
}

/**
* Get the highest number of digits
*/
private int getMaxDigit(int[] arr) {
int maxValue = getMaxValue(arr);
return getNumLenght(maxValue);
}

private int getMaxValue(int[] arr) {
int maxValue = arr[0];
for (int value : arr) {
if (maxValue < value) {
maxValue = value;
}
}
return maxValue;
}

protected int getNumLenght(long num) {
if (num == 0) {
return 1;
}
int lenght = 0;
for (long temp = num; temp != 0; temp /= 10) {
lenght++;
}
return lenght;
}

private int[] radixSort(int[] arr, int maxDigit) {
int mod = 10;
int dev = 1;

for (int i = 0; i < maxDigit; i++, dev *= 10, mod *= 10) {
// Consider the case of negative numbers. Here, double the number of queues, where [0-9]Corresponding to negative numbers,[10-19]Corresponding positive number (bucket + 10)
int[][] counter = new int[mod * 2][0];

for (int j = 0; j < arr.length; j++) {
int bucket = ((arr[j] % mod) / dev) + mod;
counter[bucket] = arrayAppend(counter[bucket], arr[j]);
}

int pos = 0;
for (int[] bucket : counter) {
for (int value : bucket) {
arr[pos++] = value;
}
}
}

return arr;
}

/**
* Automatically expand capacity and save data
*
* @param arr
* @param value
*/
private int[] arrayAppend(int[] arr, int value) {
arr = Arrays.copyOf(arr, arr.length + 1);
arr[arr.length - 1] = value;
return arr;
}
}```

## 5.3 bucket sorting

Bucket sorting is an upgraded version of counting sorting. It makes use of the mapping relationship of the function. The key to efficiency lies in the determination of the mapping function. In order to make bucket sorting more efficient, we need to do these two things:

1. When there is enough extra space, try to increase the number of barrels
2. The mapping function used can evenly distribute the input N data into K buckets

At the same time, for the sorting of elements in the bucket, it is very important to choose a comparative sorting algorithm for the performance.

```public class BucketSort implements IArraySort {

private static final InsertSort insertSort = new InsertSort();

@Override
public int[] sort(int[] sourceArray) throws Exception {
// yes arr Copy without changing the parameters
int[] arr = Arrays.copyOf(sourceArray, sourceArray.length);

return bucketSort(arr, 5);
}

private int[] bucketSort(int[] arr, int bucketSize) throws Exception {
if (arr.length == 0) {
return arr;
}

int minValue = arr[0];
int maxValue = arr[0];
for (int value : arr) {
if (value < minValue) {
minValue = value;
} else if (value > maxValue) {
maxValue = value;
}
}

int bucketCount = (int) Math.floor((maxValue - minValue) / bucketSize) + 1;
int[][] buckets = new int[bucketCount][0];

// The mapping function is used to allocate the data to each bucket
for (int i = 0; i < arr.length; i++) {
int index = (int) Math.floor((arr[i] - minValue) / bucketSize);
buckets[index] = arrAppend(buckets[index], arr[i]);
}

int arrIndex = 0;
for (int[] bucket : buckets) {
if (bucket.length <= 0) {
continue;
}
// Sort each bucket. Insert sort is used here
bucket = insertSort.sort(bucket);
for (int value : bucket) {
arr[arrIndex++] = value;
}
}

return arr;
}

/**
* Automatically expand capacity and save data
*
* @param arr
* @param value
*/
private int[] arrAppend(int[] arr, int value) {
arr = Arrays.copyOf(arr, arr.length + 1);
arr[arr.length - 1] = value;
return arr;
}

}```

public class CountingSort implements IArraySort {

@Override
public int[] sort(int[] sourceArray) throws Exception {
/ / copy the arr without changing the parameters
int[] arr = Arrays.copyOf(sourceArray, sourceArray.length);

int maxValue = getMaxValue(arr);

return countingSort(arr, maxValue);
}

private int[] countingSort(int[] arr, int maxValue) {
int bucketLen = maxValue + 1;
int[] bucket = new int[bucketLen];

for (int value : arr) {
bucket[value]++;
}

int sortedIndex = 0;
for (int j = 0; j < bucketLen; j++) {
while (bucket[j] > 0) {
arr[sortedIndex++] = j;
bucket[j]--;
}
}
return arr;
}

private int getMaxValue(int[] arr) {
int maxValue = arr[0];
for (int value : arr) {
if (maxValue < value) {
maxValue = value;
}
}
return maxValue;
}

}

Tags: data structure

Posted by martinacevedo on Mon, 18 Apr 2022 13:52:11 +0930