1. Introduction to Algorithms
1.1 Binary Search
-
For lists containing n elements, a binary search requires a maximum of log2n steps, while a simple search requires a maximum of N steps
-
Binary lookups only work when lists are ordered
-
Code implementation:
def binary_search(list, item): low = 0 high = len(list) - 1 while low <= high: mid = (low + high) guess = list[mid] if guess == item: return mid if guess > item: high = mid - 1 else: low = mid + 1 return None my_list = [1, 3, 5, 7, 9] print(binary_search(my_list, 3))
-
Runtime: Logarithmic time
1.2 Large O Representation
-
The large O representation is a special representation indicating the speed of the algorithm
-
The large O representation allows you to compare operands, which indicates how fast the algorithm runs.
-
The large O notation indicates the worst-case run time
-
Common large O runtime:
-
O ( l o g n ) O(log n) O(logn), logarithmic time, including binary lookup.
-
O ( n ) O(n) O(n), linear time, including simple lookups.
-
O ( n l o g n ) O(nlog n) O(nlogn), including Quick Sort, a faster sort algorithm.
-
O ( n 2 ) O(n^2) O(n2), including Selective Sorting, a slower sort algorithm.
-
O ( n ! ) O(n!) O(n!), Includes a solution to the traveler problem - a very slow algorithm.
Large O run time algorithm O ( 1 ) O(1) O(1) (Constant Time) Hash list O ( l o g n ) O(log n) O(logn) (logarithmic time) Binary Search O ( n ) O(n) O(n) (linear time) Simple Find O ( n l o g n ) O(nlog n) O(nlogn) Quick Sort O ( n 2 ) O(n^2) O(n2) Select Sort O ( n ! ) O(n!) O(n!) traveling salesman problem
-
-
Inspirations:
- The speed of the algorithm is not the time, but the increase of the operands.
- When we talk about the speed of the algorithm, what we mean is how fast the running time will increase as the input increases.
- The running time of the algorithm is expressed in large O notation.
- O ( l o g n ) O(log n) O(logn) ratio O ( n ) O(n) O(n) is fast, and the more elements you need to search for, the faster the former is.
2. Select Sort
2.1 Arrays and Chain Lists
-
array
-
Chain list: Each element of the chain table stores the address of the next element, thereby stringing together a series of random memory addresses
-
Array and Chain List Runtime:
array linked list read O ( 1 ) O(1) O(1) O ( n ) O(n) O(n) input O ( n ) O(n) O(n) O ( 1 ) O(1) O(1) delete O ( n ) O(n) O(n) O ( 1 ) O(1) O(1) -
Index: The location of an element is called an index
-
Insert: When using a chain table, inserting an element is simple, simply modifying the address to which the element preceding it points. When using arrays, you must move back all the elements that follow. Chain lists are a better choice when you need to insert elements in the middle.
-
Delete: Chain lists are a better choice when deleting elements.
2.2 Selection Sort
-
Walk through the list to find the largest element and add it to a new list. Continue with this and you'll get an ordered list
-
Runtime: O ( n 2 ) O(n^2) O(n2)
-
Code implementation:
#Find Minimum def findSmallest(arr): smallest = arr[0] smallest_index = 0 for i in range(1, len(arr)): if arr[i] < smallest: smallest = arr[i] smallest_index = i return smallest_index #Add Minimum to New Array def selectionSort(arr): newArr = [] for i in range(len(arr)): smallest = findSmallest(arr) newArr.append(arr.pop(smallest)) return newArr print(selectionSort([5, 3, 6, 2, 10]))
3. Recursion
- Recursive: Call your own function
- Recursive condition: Recursive condition refers to the function calling itself
- Baseline condition: A baseline condition means that a function no longer calls itself, thereby avoiding an infinite loop.
- Stack
- Recursive Call Stack
- Stack operation: press in and pop out
4. Quick Sorting
Divide and conquer (D&C)
-
A well-known recursive problem solving approach.
-
step
- Find Baseline Conditions
- Determine how to scale down the issue to meet baseline criteria
4.2 Quick Sort
-
Steps:
- Selecting an element from an array, which is called a base value, generally uses the first element of the array as the base value
- Partition: Find elements smaller than the base value and elements larger than the base value:
- A subarray of all numbers less than the base value
- Baseline value
- A subarray of all arrays greater than the base value
- Quick Sort Two Subarrays
-
Code implementation:
def quicksort(array): if len(array) < 2: return array else: pivot = array[0] less = [i for i in array[1:] if i <= pivot] greater = [i for i in array[1:] if i > pivot] return quicksort(less) + [pivot] + quicksort(greater) print(quicksort([10, 5, 2, 3]))
4.3 Re-talk about Big O notation
- Quick sorting performance is highly dependent on the benchmark value you choose
5. Hash List
5.1 hash function
-
Map Input to Number
-
Requirement:
- Uniformity
- One-to-one correspondence
-
Hash functions always map the same input to the same index
-
Hash functions map different inputs to different indexes
-
The hash function knows how big the array is and returns only valid indexes
5.2 Hash list
-
Hash List: hash map, map, dictionary, associated array
-
The Hash list provided by python is a dictionary
-
Application of hash list:
- Simulate mapping relationships
- Prevent duplication
- Use as a cache
- Benefits: Users see web pages faster and server work less
-
Conflict: Two keys are assigned the same location
- Reflections:
- The hash function is important. Try to have hash functions map keys evenly to different locations in the hash list.
- If a Hash list stores a chain table that is too long, the speed of the Hash list will decrease dramatically
- Reflections:
-
Performance: Hash lists are found, inserted, and deleted very quickly
Hash List (Average) Hash list (worst case) array chain read O ( 1 ) O(1) O(1) O ( n ) O(n) O(n) O ( 1 ) O(1) O(1) O ( n ) O(n) O(n) input O ( 1 ) O(1) O(1) O ( n ) O(n) O(n) O ( n ) O(n) O(n) O ( 1 ) O(1) O(1) delete O ( 1 ) O(1) O(1) O ( n ) O(n) O(n) O ( n ) O(n) O(n) O ( 1 ) O(1) O(1)
5.3 Filling factor
- Fill factor = total number of elements/locations in the Hash list
- Filling factor = 1: Best case, each element has its own location
- Fill factor >1: Number of elements exceeds number of positions in the array
- When the filling factor starts to increase, you need to add a position to the hash list, which is called adjusting the length
- The lower the fill factor, the less likely the conflict will occur and the higher the performance of the hash list.
- Empirical rule: Once the fill factor is greater than 0.7, adjust the length of the hash list.
6. Width-first search
6.1 Shortest Path Problem
- Steps:
- Use diagrams to model problems
- Solve problems using breadth-first search
6.2 Width-first Search
-
Finding algorithm for Graphs
-
Application:
- The first type of question: Find a path: From Node A, is there a path to Node B?
- The second type of question: Find the shortest path (the path with the fewest segments, not the fastest): Which path is the shortest from Node A to Node B?
-
Find the shortest path:
- During the execution of the breadth-first search, the search range extends outward gradually from the start, that is, first checking the first-degree relationship, then checking the second-degree relationship
- This can only be achieved if the search is in the order in which it was added
-
Queue:
-
Operation: Entry and Exit
-
Operation sequence: FIFO
data structure Operation sequence Stack FIFO queue FIFO
-
-
Implementation Diagram
- Directed graph: where the relationship is one-way
- Undirected graph: no arrows, directly connected nodes are related to each other
-
How the algorithm works:
- Create a queue to store the people you want to check
- Pop a person out of the queue
- Check if this person is a target
- Queue all this person's neighbors
-
Algorithmic implementation:
from collections import deque graph = {} graph["you"] = ["alice", "bob", "claire"] graph["bob"] = ["anuj", "peggy"] graph["alice"] = ["peggy"] graph["claire"] = ["thom", "jonny"] graph["anuj"] = [] graph["peggy"] = [] graph["thom"] = [] graph["jonny"] = [] def person_is_seller(name): return name[-1] == 'm' def search(name): search_queue = deque() search_queue += graph[name] searched = [] while search_queue: person = search_queue.popleft() if not person in searched: if person_is_seller(person): print(person + " is a mango seller!") return True else: search_queue += graph[person] searched.append(person) return False search("you")
-
Runtime: O ( V + E ) O(V+E) O(V+E), V is the number of vertices, E is the number of edges
7. Dixtra algorithm
7.1 Using the Dixtra algorithm
- Objective: To find the quickest route
- Steps:
- Find the cheapest node to reach in the shortest time
- The cost of updating the neighbors of this node, which will be explained later
- Repeat this process until it is done for each node in the diagram
- Calculate Final Path
7.2 Terms
- Weight: The Dixtra algorithm is used for graphs where each edge has associated numbers, which are called weights
- Weighted graph: A graph with weights is called a weighted graph. To compute the shortest path in a weighted graph, the Dixtra algorithm can be used
- Non-weighted graph: A graph without weights is called a non-weighted graph. To calculate the shortest path in a non-weighted graph, use a breadth-first search
7.3 Implementation
graph = {} graph["you"] = ["alice", "bob", "claire"] graph["start"] = {} graph["start"]["a"] = 6 graph["start"]["b"] = 2 graph["a"] = {} graph["a"]["fin"] = 1 graph["b"] = {} graph["b"]["a"] = 3 graph["b"]["fin"] = 5 graph["fin"] = {} infinity = float("inf") infinity = float("inf") costs = {} costs["a"] = 6 costs["b"] = 2 costs["fin"] = infinity parents = {} parents["a"] = "start" parents["b"] = "start" parents["fin"] = None processed = [] def find_lowest_cost_node(costs): lowest_cost = float("inf") lowest_cost_node = None for node in costs: cost = costs[node] if cost < lowest_cost and node not in processed: lowest_cost = cost lowest_cost_node = node return lowest_cost_node node = find_lowest_cost_node(costs) while node is not None: cost = costs[node] neighbors = graph[node] for n in neighbors.keys(): new_cost = cost + neighbors[n] if costs[n] > new_cost: costs[n] = new_cost parents[n] = node processed.append(node) node = find_lowest_cost_node(costs)
8. Greedy algorithm
8.1 Classroom Scheduling Problem
- Ideas:
- Choose the earliest lesson to end, which is the first lesson in this classroom.
- Next, you must choose a class that does not start until after the first lesson is over. Again, you choose to end the earliest lesson, which will be the second lesson in this classroom.
8.2 Backpack Problem
- Ideas:
- Steal the most expensive item that can be loaded into a backpack.
- Re-theft also loads the most expensive items in a backpack, and so on.
8.3 Set Coverage Problem
-
Approximation algorithm: Approximation algorithm can be used when it takes too long to obtain an exact solution
- Standard: how fast, how close the approximate solution is to the optimal solution,
-
Runtime: O ( n 2 ) O(n^2) O(n2)
-
Code implementation:
states_needed = set(["mt", "wa", "or", "id", "nv", "ut","ca", "az"]) stations = {} stations["kone"] = set(["id", "nv", "ut"]) stations["ktwo"] = set(["wa", "id", "mt"]) stations["kthree"] = set(["or", "nv", "ca"]) stations["kfour"] = set(["nv", "ut"]) stations["kfive"] = set(["ca", "az"]) final_stations = set() while states_needed: best_station = None states_covered = set() for station, states_for_station in stations.items(): covered = states_needed & states_for_station if len(covered) > len(states_covered): best_station = station states_covered = covered final_stations.add(best_station) states_needed -= states_covered print(final_stations)
-
Runtime:
Exact algorithm greedy algorithm O ( n ! ) O(n!) O(n!) O ( n 2 ) O(n^2) O(n2)
8.4 Travel salesman problem
- Problem: To find the shortest path through a specified number of points - the NP Complete Problem
9. K nearest neighbor algorithm
9.1 Create Recommendation System
-
Feature extraction: Pythagoras formula: ( a 1 − a 2 ) 2 + ( b 1 − b 2 ) 2 + ( c 1 − c 2 ) 2 + ( d 1 − d 2 ) 2 + ( e 1 − e 2 ) 2 \sqrt{(a_1-a_2)^2+(b_1-b_2)^2+(c_1-c_2)^2+(d_1-d_2)^2+(e_1-e_2)^2} (a1−a2)2+(b1−b2)2+(c1−c2)2+(d1−d2)2+(e1−e2)2
-
Category: Grouping
-
Regression: Prediction results
9.2 Machine Learning
-
OCR (Optical Character Recognition)
- Training: Browse through a large number of digital images to extract the characteristics of these numbers.
- When you encounter a new image, you extract its features and find out who its nearest neighbors are!
-
Naive Bayes classifier
-
Forecast the stock market