ASD Flashcards

(127 cards)

1
Q

Algorithm Specification

A

A specification precisely expresses the task to be done and consists of:
Name: The name of the algorithm and its arguments.
Initial Condition (Input): Specifies what constitutes “correct” input data.
Final Condition (Output): Specifies the desired result or state after the algorithm finishes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name

A

The name of the algorithm and its arguments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Initial Condition (Input)

A

Specifies what constitutes “correct” input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Final Condition (Output)

A

Specifies the desired result or state after the algorithm finishes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Total Correctness

A

An algorithm is totally correct if, for any correct input data, it stops and returns the correct output .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Partial Correctness

A

An algorithm is partially correct if, receiving correct input data, if it stops, the result is guaranteed to be correct. It does not guarantee that the algorithm will actually stop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Stop Property

A

The requirement that an algorithm will certainly finish after a finite number of iterations for any correct input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Prove the Stop Property

A

Often shown by identifying a loop variable that moves toward a constant termination condition by a fixed increment/decrement .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prove Partial Correctness

A

Usually involves using a Loop Invariant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Loop Invariants Definition

A

A logical predicate that, if satisfied before an iteration of a loop, remains satisfied after that iteration.

Example: In a “find maximum” algorithm, a typical invariant is that the current candidate $x$ is the maximum of all array elements visited so far ($\forall_{0\le j<i}x\ge Arr[j]$)16.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Loop Invariants Application

A

To prove an algorithm’s goal, you combine the loop invariant with the loop’s termination condition.

Example: In a “find maximum” algorithm, a typical invariant is that the current candidate $x$ is the maximum of all array elements visited so far ($\forall_{0\le j<i}x\ge Arr[j]$)16.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measuring Algorithm Speed

A

Method: The speed of an algorithm is measured by counting its basic (elementary) operations rather than measuring execution time in seconds.

Independence: This approach ensures the measure is independent of specific programming languages, hardware, or software platforms.

Internal Property: It treats complexity as an internal property of the algorithm itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Assessment Prerequisites

A

Determine the Dominating Operation Set: Identify the operations that represent the bulk of the work4.

Determine the Data Size: Identify what in the input (usually denoted as $n$) influences the number of operations5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dominating Operations

A

Definition: Dominating operations are those performed most frequently and whose count is proportional to the total amount of work performed by the algorithm.

Requirement: These operations must be performable in constant time.

Examples: In a search algorithm, the comparison arr[i] == key or the index increment i++ are typical dominating operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Time Complexity

A

The number of dominating operations executed by the algorithm expressed as a function of the data size $n$9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Space Complexity ($S(n)$)

A

The number of units of memory (e.g., machine words) used by the algorithm as a function of data size. Usually, memory used for input data is excluded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pessimistic (Worst-case) Time Complexity ($W(n)$

A

The maximum number of dominating operations performed for any input of size $n$; it is the “supremum” of work across all possible datasets of that size12121212.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Average Time Complexity ($A(n)$)

A

The expected value of the number of dominating operations, calculated by multiplying the number of operations for each case by its probability of occurrence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Asymptotic Notation

A

Purpose: It provides a way to express complexity while neglecting unimportant details such as additive or multiplicative constants (e.g., $2.1n - 1$ is simply seen as having a linear rank) 14.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

“Big O” ($O$ )

A

Defines an upper bound. $f(n) = O(g(n))$ if there exist constants $c > 0$ and $n_0$ such that $f(n) \le c \cdot g(n)$ for all $n \ge n_0$15151515. Intuitively, it corresponds to “$\le$”16.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

“Big Theta” ($\Theta$)

A

Defines the same rank. $f(n) = \Theta(g(n))$ if $f(n) = O(g(n))$ and $g(n) = O(f(n))$17. Intuitively, it corresponds to “$=$”18.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

“Big Omega” ($\Omega$)

A

Defines a lower bound. $f(n) = \Omega(g(n))$ means $g(n) = O(f(n))$, intuitively corresponding to “$\ge$”19191919.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Small “o” and Small “omega” ($\omega$)

A

These represent sharp inequalities, intuitively corresponding to “$<$” and “$>$” respectively 20.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Common Ranks of Functions (from lowest to highest growth)

A

Constant: $\Theta(1)$
Logarithmic: $\Theta(\log n)$
Linear: $\Theta(n)$
Linear-logarithmic: $\Theta(n \log n)$
Square (Quadratic): $\Theta(n^2)$
Cubic: $\Theta(n^3)$
Exponential: $\Theta(2^n)$
Factorial: $\Theta(n!)$

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
General Searching and Sequential Algorithm
The Problem: Searching for a key in a sequence of length len. Sequential Search: A standard approach where elements are checked one by one2. Analysis: Dominating Operation: Comparison of elements3. Complexity: Linear time complexity, $W(len) = \Theta(len)$4. Limitation: For unordered sequences, this cannot be improved5.
26
Searching in Sorted Sequences
Efficiency: If the input sequence is sorted (non-decreasingly or non-increasingly), the problem can be solved much faster. Skipping Algorithm: Checks every k-th cell7. If a value higher than the key is found, it performs a linear search only on the preceding k-1 elements8. Complexity: Asymptotically $k$ times faster than sequential search, but still linear rank: $W(len) = \frac{1}{k} \cdot \Theta(len)$9. Optimal Jump: The optimal choice for $k$ is $\sqrt{len}$10.
27
Binary Search Algorithm
Method: Uses the "Divide and Conquer" rule by repeatedly halving the search space. Process:Check the middle element12.If it equals the key, return the index13.If the middle element is higher than the key, restrict the search to the left sub-sequence; otherwise, search the right. Analysis: Time Complexity: $W(len) = \Theta(\log_2(len))$ and $A(len) = \Theta(\log_2(len))$15. Space Complexity: $O(1)$16. Requirement: Requires random access to elements (data must be in RAM).
28
Positional Statistics
Definition: The k-th positional statistic is the k-th smallest (or largest) element in a sequence18. Tournament Algorithm: Used to find the second smallest element efficiently19. Elements compete in pairs; winners (smaller elements) move to the next turn20. The second smallest must be among those who lost directly to the winner in the tournament history21. Complexity: Requires $len - 1$ comparisons to find the minimum, plus $O(\log_2(len))$ to find the second smallest22.
29
Partition and Hoare’s Algorithm
Partition Procedure: Selects a "median" element M and reorganizes the sequence so elements $\le M$ are on the left and elements $\ge M$ are on the right23. Complexity: Time $W(n) = n + O(1)$ and space $S(n) = O(1)$24. Hoare’s Algorithm (Quickselect): Uses partition to find the k-th statistic25. If the partition index equals k, the element is found; otherwise, it recurses on the appropriate sub-sequence26. Complexity: Linear time complexity on average27.
30
The Sorting Problem
Input: A sequence $S$ of elements that can be ordered using a total-order relation1. Output: A non-decreasingly sorted sequence $S'$ containing the same elements as the input2. Importance: Sorting is a fundamental operation used to accelerate searching, visualize data, and compute statistical characteristics 3.
31
Selection Sort
Idea: Repeatedly identify the minimum element from the unsorted portion of the array and swap it into its correct position at the front 4. Analysis: Dominating Operation: Comparison of two elements5. Pessimistic and Average Complexity: $W(len) = A(len) = \Theta(n^2)$ 6. Behavior: It always performs the same number of comparisons regardless of the initial order of the data, even if the sequence is already sorted7.
32
Insertion Sort
Idea: Iteratively take the "next" element and insert it into its correct position within the already sorted prefix of the sequence 8. Analysis: Pessimistic Case: Occurs when data is invertedly sorted, leading to $\Theta(n^2)$9. Average Case: $\Theta(n^2)$, though it is typically twice as fast as Selection Sort in practice. Optimistic Case: Very fast for nearly sorted data; if already sorted, it only requires $n-1$ comparisons (linear complexity)11. Variant: Using Binary Search to find the insertion point reduces the number of comparisons to $\Theta(n \log n)$, but the overall complexity remains $\Theta(n^2)$ due to the linear cost of shifting elements in an array.
33
Merge Sort
Idea: A "Divide and Conquer" algorithm that splits the sequence into halves, sorts each half recursively, and then merges the sorted halves 13. Analysis: Time Complexity: $W(len) = A(len) = \Theta(n \log n)$14. Space Complexity: High for arrays ($\Theta(n)$) because it requires temporary memory for merging. Performance: The difference between linear-logarithmic and square complexity is massive for large datasets; for example, sorting 100 million records.
34
Linked Lists vs. Arrays
Linked Lists: Structure: Consists of nodes containing data and a link to the next node17. Advantages: Very fast modification operations (inserting/deleting) and can improve Merge Sort space complexity to $O(1)$ by avoiding element copying. Disadvantages: Slower access (no random access; must start from head) and extra memory for links19.Arrays:Advantages: Very fast random access and lower memory overhead20.Disadvantages: Inserting or deleting elements has a linear time complexity21.
35
Stability of Sorting Algorithms
Definition: A sorting algorithm is stable if it preserves the original relative order of elements with the same value (ties). Importance: In practical applications like databases, records are often sorted by one attribute (a key). Stability ensures that if two records have the same key, their original order is maintained, which is crucial for sorting multi-attribute records in successive iterations. QuickSort: It is less naturally stable than algorithms like Merge Sort.
36
QuickSort
Idea: A "Divide and Conquer" algorithm that uses a pivot element ($M$) to reorganize a sequence so that no larger element is to its left and no smaller element is to its right. The process is then applied recursively to the two resulting subsequences. Partition: This linear-time procedure ($W(n) = \Theta(n)$) places the pivot in its final position and returns its index. Complexity: Average Case: $\Theta(n \log n)$. Pessimistic Case: $\Theta(n^2)$, occurring when the input is already sorted or invertedly sorted, leading to maximum recursion depth. Space: Although it sorts "in place," recursion costs implicit memory, resulting in a pessimistic space complexity of $O(n)$.
37
The Lower Bound for Comparison-Based Sorting
The Limit: It is mathematically impossible for any comparison-based sorting algorithm to have an average or worst-case time complexity better than linear-logarithmic, or $\Theta(n \log n)$. Explanation: This can be modeled as a binary decision tree where each leaf is one of $n!$ possible permutations of the input. The height of this tree (number of comparisons) must be at least $\log_2(n!)$, which is approximately $n \log n$.
38
Non-Comparison Based Sorting
To "beat" the $\Theta(n \log n)$ limit, algorithms must use operations other than comparisons, often trading higher space complexity for lower time complexity. CountSort
39
CountSort
Idea: Uses direct addressing to place elements. It requires the input to be non-negative integers. Complexity: Time complexity is linear, $\Theta(n + m)$, where $n$ is the sequence length and $m$ is the maximum value. However, space complexity is also high at $\Theta(n + m)$ due to helper arrays.
40
RadixSort
Idea: A scheme that sorts objects (like strings or multi-digit numbers) by applying a stable internal sorting algorithm to each digit or position, starting from the least significant to the most significant. Internal Algorithm: CountSort is a common choice if the symbols (digits/alphabet) are limited.
41
Aspects of Recursion as an Algorithmic Tool
Definition: Recursion involves a function calling itself1. In algorithms, it is used to reduce a problem instance to a smaller instance of the same problem, often referred to as "divide and conquer"2. Positive Aspect: Recursion provides a very compact and elegant representation of an algorithm3. Negative Aspects: It implicitly costs additional memory because the machine must maintain a recursion stack. Deep recursion can lead to system failures (e.g., calling a recursive function for $n=100,000$ might crash a machine). Whenever possible, recursion should be translated into iterative versions to save memory6.
42
Fibonacci Numbers
Defined by the base cases $Fib(0)=0$, $Fib(1)=1$ and the step $Fib(n+1) = Fib(n) + Fib(n-1)$. The value grows exponentially; for example, $Fib(50)$ is over 12 billion8.
43
Hanoi Towers
A riddle involving moving $n$ rings between three sticks9. The number of moves is defined by $hanoi(n) = 2 \cdot hanoi(n-1) + 1$, which solves to $2^n - 1$ moves.
44
Solving Linear 2nd-Order Equations
To solve a recurrent equation of the form $s_n = as_{n-1} + bs_{n-2}$, you must use the characteristic equation: $x^2 - ax - b = 0$11. Single Solution ($r$): $s_n = c_1r^n + c_2nr^n$12. Two Solutions ($r_1, r_2$): $s_n = c_1r_1^n + c_2r_2^n$13. Binet's Formula: An application of this method to the Fibonacci sequence, yielding $Fib(n) = \frac{1}{\sqrt{5}} \left( \left( \frac{1+\sqrt{5}}{2} \right)^n - \left( \frac{1-\sqrt{5}}{2} \right)^n \right)$14141414.
45
Important 3 Cases of Recursive Equations
In algorithmics, three specific recursive forms are frequently encountered for time complexity $t(n)$: Case 1 (Logarithmic): $t(n) = t(n/2) + c \Rightarrow \Theta(\log n)$ (e.g., Binary Search). Case 2 (Linear): $t(n) = 2t(n/2) + c \Rightarrow \Theta(n)$ (e.g., finding maximum in a sequence). Case 3 (Linear-Logarithmic): $t(n) = 2t(n/2) + cn \Rightarrow \Theta(n \log n)$ (e.g., Merge Sort).
46
The Master Theorem
The Master Theorem provides a universal method for solving equations in the form $T(n) = aT(n/b) + f(n)$18. It compares the rank of $f(n)$ with $n^{\log_b a}$19: If $f(n)$ is polynomially lower than $n^{\log_b a}$, then $T(n) = \Theta(n^{\log_b a})$. If both are of the same rank, then $T(n) = \Theta(n^{\log_b a} \log n)$. If $f(n)$ is polynomially higher (and satisfies a regularity condition), then $T(n) = \Theta(f(n))$.
47
Sequence Data Structures: Concrete Implementations
Sequences are the most common data structures and are primarily implemented using Arrays or Linked Lists.
48
Arrays
Absolute Access: Extremely fast, constant time $O(1)$. Relative Access: Slow, linear time $\Theta(n)$. Limitations: Bounded size; any "insert" operation has a pessimistic linear time cost.
49
Linked Lists
Singly Linked (SList): Nodes contain an element and a pointer to the next node. Doubly Linked (DList): Nodes contain pointers to both the next and previous nodes, allowing bidirectional navigation. Cyclic Lists: The last node links back to the first one. Trade-offs: Fast relative operations (e.g., "insert after") in constant time, but slow linear time absolute access and extra memory for pointers.
50
Abstract Data Structures (ADS)
An ADS is defined by its interface (supported operations) rather than its concrete implementation .
51
Stack
A "LIFO" (Last In, First Out) structure with push, pop, and top operations .
52
Queue
A "FIFO" (First In, First Out) structure with inject, out, and front operations .
53
Deque
A Double Ended Queue that generalizes both stack and queue, allowing operations at both ends.
54
Amortized Complexity Analysis
This method analyzes the total cost of a sequence of $m$ operations rather than evaluating each operation in isolation14141414. It is useful when some operations are expensive but rare15.
55
Potential Function Method
assigns a non-negative potential $\Phi$ to the state of the structure; the amortized cost is $a_i = t_i + \Phi_i - \Phi_{i-1}$16161616.
56
Accounting Method
"Credits" are assigned to objects during cheap operations to pay for expensive future operations.
57
Total Cost Method
Simply computes the total cost of $m$ operations to find the average18.
58
Unbounded Arrays
Unbounded arrays emulate dynamic growth using a bounded array that is reallocated when full19191919. Growth Rule: If the array is full ($n = w$), allocate an array $2 \times$ larger and copy elements. Shrink Rule: If the number of elements is significantly smaller than the capacity (e.g., $n = w/4$), reallocate to a smaller array21. Complexity: While the pessimistic cost of a single pushBack is $O(n)$, the amortized cost for a sequence of $m$ operations is constant $O(1)$.
59
Comparative Analysis of Sequence Operations
Operation SList DList UArray (Unbounded) CArray (Cyclic) Indexing [.] O(n) O(n) 0(1) 0(1) First/Last O(1) 0(1) 0(1) 0(1) Insert/Remove 0(1)1 0(1)? O(n) O(n) pushBack 0(1) 0(1) 0(1) 3 0(1)3 popFront 0(1) 0(1) O(n) 0(1)3
60
Only insertAfter/removeAfter. $^2$ Requires external pointer. $^3$ Amortized cost 23.
61
Definition of Priority Queue (PQ)
A Priority Queue is an Abstract Data Structure where each element has an associated "priority." It differs from a standard queue because it does not follow the FIFO rule; instead, it allows access to the element with the highest (or lowest) priority . Core Operations: insert(T e): Adds a new element with an assigned priority. findMin(): Returns the element with the minimum priority. delMin(): Returns and deletes the element with the minimum priority .
62
Naive Implementations and Complexities
Unsorted Sequence (List/Array): insert: 0(1). delMin: On). o Sorted Sequence (List/Array): insert: O(n). o delMin: O(1).
63
Binary Heaps
A Binary Heap is a complete binary tree that satisfies the heap-order condition: for every non-root node $x$, the priority of the parent is less than or equal to the priority of $x$ 10. Array Representation: Because it is a complete tree, it can be stored in an array without pointers 11. If root is at index 1: parent[i] = i/2, left_child[i] = 2i, right_child[i] = 2i + 1 12. Operations on Heap: insert: Add to bottom and perform upheap — $O(\log n)$13. delMin: Move bottom element to root and perform downheap — $O(\log n)$14. construct: Building a heap from an unsorted sequence can be done in $\Theta(n)$ time15151515.
64
Applications of Priority Queues
HeapSort: A sorting algorithm with $\Theta(n \log n)$ time complexity16. By placing the minimum element in the released space of the array, it can achieve $O(1)$ space complexity17. Greedy Algorithms: Typically used to select the "next best" element efficiently, such as in Huffman Codes, Dijkstra's shortest-path, and Prim's minimum spanning tree algorithms 18.
65
Extended Priority Queues
Addressable PQ: Supports decreaseKey(handle, newPriority) and delete(handle) using a handle to the element . Mergeable PQ: Supports merge(PQ1, PQ2) to combine two queues into one .
66
Binomial Heaps
A Binomial Heap is a collection of Binomial Trees sorted by degree21. Properties: An $n$-element heap has $O(\log n)$ trees22. Efficiency: Its primary advantage is the fast merge operation, which takes $O(\log n)$ time, similar to binary addition 23. All other standard operations (insert, delMin, delete) also take $O(\log n)$ time24.
67
Operation Unsorted Sorted Binary Heap Binomial Heap insert 0(1) O(n) 0(log n) O(1og n) findMin O(n) 0(1) 0(1) 0(log n) merge 0(1) O(n) O(n) 0(log n)
68
Dictionary Definition
A Dictionary is an Abstract Data Structure (ADS) that represents a mapping from keys to values. It primarily supports three operations : search(K key): Returns the value associated with the given key . insert(K key, V value): Adds a new key-value pair. delete(K key): Removes the entry associated with the key.
69
Hashing and Hashtables
Hashtables provide fast dictionary operations by using a hash function ($h: U \rightarrow [0..m-1]$) to map large keys into a smaller array of size $m$ 6.
70
Collision Handling
When two keys map to the same index ($h(k_1) = h(k_2)$), a collision occurs 7.
71
Chain Method
Each array index points to a linked list of elements that hash to that position8. Under a uniform load assumption, it guarantees average $O(1)$ time if $m = O(n)$ 99.
72
Open Hashing
All elements are stored in the array itself. If a position is occupied, the algorithm scans for a free index using strategies like linear probing, quadratic probing, or double hashing .
73
Special Techniques
Universal Hashing: Randomly picking a hash function from a specific family to avoid "malicious" data patterns that cause many collisions 11. Perfect Hashing: A scheme that guarantees worst-case constant time $O(1)$ for searching 12.
74
Ordered Dynamic Sets
An extension of the dictionary that also supports operations based on the order of keys, such as minimum(), maximum(), successor(key), and predecessor(key) .
75
Binary Search Tree (BST)
Condition: For every node, all keys in the left subtree are smaller and all keys in the right subtree are larger 14. Complexity: Guarantees average $O(\log n)$ time for all operations 1515. However, the worst-case is linear $O(n)$ if the tree becomes unbalanced (e.g., a "vine")1616.
76
AVL Tree
A self-balancing BST where the height difference between left and right subtrees of any node is at most 117. Guarantees worst-case $O(\log n)$ time through the use of rotations to maintain balance 18.
77
Self-organizing BST (Splay Tree)
Uses a splay operation (a sequence of rotations) to move frequently accessed or recently modified keys to the root 19. Guarantees amortized $O(\log n)$ complexity20.
78
Comparative Analysis of Implementations
Implementation Search (Avg) Search (Worst) Order Operations? Unordered Sequence O(n) O(n) No Direct Addressing 0(1) 0(1) Yes (but limited) Hashtable (Chain) 0(1) O(n) No BST 0(log n) O(n) Yes AVL Tree 0(logn) 0 (log n) Yes Self-org. BST 0(1ogn)* 0(log n)* Yes
79
Undirected Graph
An ordered pair $G=(V,E)$, where $V$ is a set of vertices and $E$ is a set of edges 2. Each edge $e=\{v,w\}$ is an unordered pair of vertices3.
80
Directed Graph (Digraph)
An ordered pair $G=(V,E)$, where each edge (arc) is an ordered pair $(v,w)$, indicating direction from $v$ to $w$4.
81
Degree of a Vertex
In an undirected graph, the number of edges incident to a vertex $v$, denoted $deg(v)$5.
82
In-degree and Out-degree
In a digraph, $deg^-(v)$ is the number of incoming arcs and $deg^+(v)$ is the number of outgoing arcs6.
83
Handshaking Lemma
The sum of all vertex degrees in an undirected graph is equal to twice the number of edges.
84
Isomorphism
Definition: Two graphs $G$ and $H$ are isomorphic if there is a bijection between their vertex sets that preserves adjacency8. Properties: Isomorphic graphs must have the same number of vertices, edges, and the same degree sequence9.
85
Adjacency Matrix
A square matrix where the entry at $(i, j)$ is 1 if there is an edge between vertex $v_i$ and $v_j$, and 0 otherwise10.
86
Incidence Matrix
A matrix showing the relationship between vertices (rows) and edges (columns).
87
Graphs vs. Relations
A digraph is a geometric representation of a binary relation on the set $V$12.
88
Path
A sequence of vertices where each adjacent pair is connected by an edge.
89
Cycle
A path that starts and ends at the same vertex.
90
Connectedness
An undirected graph is connected if there is a path between every pair of vertices.
91
Strongly vs. Weakly Connected
A digraph is strongly connected if there is a directed path between every pair . it is weakly connected if the underlying undirected graph is connected.
92
Tree
A connected undirected graph with no cycles.
93
Rooted Tree
A tree in which one vertex is designated as the root.
94
Binary Tree
A rooted tree where each internal node has at most two children.
95
Characteristics
You should be able to specify the height, depth, and number of leaves for a given rooted tree.
96
Directed Graph (Digraph)
Defined as $G = (V, E)$, where $V$ is a set of vertices and $E \subseteq V \times V$ is a set of arcs (ordered pairs) 1.
97
Undirected Graph
Similar to a digraph, but edges are unordered pairs $\{u, v\}$ 2.
98
Terminology
An edge $e = (u, v)$ is said to be incident to $u$ and $v$, while $u$ and $v$ are adjacent to each other 3.
99
Variants
Self-loops: Generally not allowed 4. Multi-graph: Allows multiple edges between the same vertices 5. Hypergraph: Generalization where edges are $n$-tuples rather than pairs 6.
100
General Tree
A connected, undirected graph with no cycles .
101
Rooted Tree
A tree with one designated vertex called the root; it is often viewed as a directed graph where arcs point away from the root .
102
Binary Tree
A rooted tree where each node has at most two children (left and right) .
103
Adjacency Matrix
A 2D array where A[i][j] = 1 if an edge exists from $v_i$ to $v_j$. Pros: Constant time $O(1)$ to check if an edge exists. Cons: High space complexity $O(V^2)$, inefficient for sparse graphs.
104
Adjacency List
An array of lists where each list $i$ contains the neighbors of $v_i$. Pros: Space-efficient for sparse graphs $O(V+E)$. Cons: Checking for a specific edge $(u, v)$ takes $O(deg(u))$ time.
105
Binary Tree Traversals
These recursive methods define the order in which nodes are visited 10: Pre-order: Visit Root -> Left Subtree -> Right Subtree 11. In-order: Visit Left Subtree -> Root -> Right Subtree 12. Post-order: Visit Left Subtree -> Right Subtree -> Root 13.
106
Breadth-First Search (BFS)
Method: Explores neighbors layer-by-layer using a queue 14. Properties:Computes the shortest path (minimum number of edges) from the source to all reachable vertices 15. Complexity: $O(V + E)$ 16.
107
Depth-First Search (DFS)
Method: Explores as far as possible along each branch before backtracking; uses a stack or recursion 17. Timestamps: Each vertex $v$ receives a discovery time $v.d$ and a finishing time $v.f$ 18. Edge Classification:Tree edges: Part of the DFS forest. Back edges: Connect to an ancestor in the tree (indicate cycles). Forward edges: Connect to a descendant. Cross edges: Connect to nodes in different branches or trees.
108
BFS vs. DFS Comparison
FeatureBFSDFSData StructureQueue (FIFO)Stack (LIFO) / RecursionPrimary UseShortest path (edge count)Connectivity, cycle detection, topologyComplexity$O(V+E)$$O(V+E)$
109
Topological Sort
An ordering of vertices in a Directed Acyclic Graph (DAG) such that for every arc $(u, v)$, $u$ comes before $v$ 19. High-level idea: Run DFS to compute finishing times $v.f$ for all vertices, then sort vertices in decreasing order of finishing times 20.
110
Strongly Connected Components (SCCs)
Subsets of a digraph where every node is reachable from every other node in the subset . DFS Application: Run DFS, reverse all arcs, then run DFS again in decreasing order of the first run's finishing times .
111
Minimum Spanning Tree (MST) Problem
Definition: Given an undirected connected graph $G=(V, E)$ with positive weights on edges, find a tree $T=(V, E')$ that connects all vertices in $V$ such that the total sum of edge weights is minimized 1. Spanning Property: The resulting subgraph must be a tree (no cycles) and must "span" all original vertices 2. Input: An undirected connected graph with a weight function $w: E \rightarrow R^+$ 3.
112
Cut Property
For any cut in the graph (a partition of vertices into two sets), the edge with the minimum weight crossing that cut belongs to the MST.
113
Cycle Property
For any cycle in the graph, the edge with the maximum weight in that cycle does not belong to the MST.
114
Prim's Algorithm
Idea: Grows the MST one vertex at a time starting from an arbitrary root. Mechanism: It maintains a connected tree and always adds the cheapest edge that connects a vertex in the tree to a vertex outside the tree. Similarity: It is very similar to Dijkstra's algorithm, but uses edge weights as priorities instead of total distances to the source. Suitability: Generally a good choice for most cases, especially dense graphs.
115
Kruskal's Algorithm
Idea: Grows the MST by considering edges in increasing order of weight10. Mechanism: It adds the next cheapest edge as long as it does not create a cycle. The partial solution is a forest (a collection of trees) that eventually merges into a single MST11. Efficiency: Can be faster than Prim's on sparse graphs where $m = O(n)$12. Streaming Mode: It can work "on-line" as edges arrive through a network, even if they aren't pre-sorted13.
116
Union-Find Abstract Data Structure
Purpose: Essential for the efficient implementation of Kruskal's algorithm to keep track of connected components and detect cycles14. Operations:Find(x): Determines which component element $x$ belongs to. Union(x, y): Merges two separate components into one15. Fast Implementation: Uses techniques like "union by rank" and "path compression" to achieve nearly constant time complexity for operations16.
117
The Shortest Paths Problem
Definition: Given a graph $G=(V, E)$ with edge weights $w: E \rightarrow \mathbb{R}$ and a starting node $s$, find the shortest distance $\mu(s,v)$ and the parent node for every vertex $v$ to reconstruct the path. Non-existence: A shortest path may not exist if there is no path from $s$ to $v$, or if the graph contains a negative cycle, which allows for infinitely decreasing path lengths. Property: A subpath of a shortest path is itself a shortest path.
118
The Concept of Relaxation
The Idea: Most shortest-path algorithms use edge relaxation to iteratively improve the currently known shortest distance to a node. Relaxation Operation: For an edge $(u,v)$, if the current distance to $u$ plus the weight of the edge $(u,v)$ is less than the current distance to $v$, update $v$'s distance and set $u$ as its parent. Key Lemma: After a sequence of relaxations that includes a shortest path as a subsequence, the node's distance attribute will equal the actual shortest-path distance.
119
Variants and Algorithms
Depending on the graph's properties, different algorithms are used for maximum efficiency:
120
Case 1: Directed Acyclic Graph (DAG) Algorithm
Topological Sort followed by a single pass of relaxation. Complexity: Linear time, $O(V + E)$.
121
Case 2: Non-negative Weights (Dijkstra's Algorithm)
Mechanism: A greedy approach that uses a priority queue to always scan the unscanned node with the minimum distance. Complexity:$O((V + E) \log V)$ with a Binary Heap.$O(E + V \log V)$ with a Fibonacci Heap.
122
Case 3: Arbitrary Weights (Bellman-Ford Algorithm)
Mechanism: Relaxes all edges $V-1$ times. This is necessary because the shortest path can have at most $V-1$ edges. Cycle Detection: It can detect negative cycles by checking if any edge can still be relaxed after $V-1$ iterations. Complexity: $O(V \cdot E)$.
123
(*) All-Pairs Shortest Paths (Johnson's Algorithm idea)
Goal: Find shortest paths between all possible pairs of nodes. Mechanism for graphs with no negative cycles: Add an artificial node and run Bellman-Ford once to compute "node potentials". Use these potentials to reduce weights so they are all non-negative. Run Dijkstra's algorithm $V$ times (once from each node as a source). Complexity: $O(V \cdot E + V^2 \log V)$.
124
Invariant
never changing.
125
Loop Invariant
A loop invariant is a condition or statement about program variables that remains true before and after every iteration of a loop, acting as a crucial checkpoint to prove an algorithm's correctness and understand its behavior. It helps verify that a loop achieves its goal by maintaining a specific property (e.g., a sub-array is sorted, a sum is correct) throughout its execution, even as individual variables change. result = 5 == loopResult = loop{}
126
Hand-shake Theorem
Definition: In any undirected graph, the sum of the degrees of all vertices is equal to twice the number of edges.Formula: $\sum_{v \in V} \text{deg}(v) = 2|E|$.Implication: Every edge has two endpoints, so it contributes exactly 2 to the total degree count of the graph.
127