Query Processing:
Basic Steps
Query Processing:
Diagram of the Basic Steps
*See Diagram

Query Processing Steps:
Query
User inputs a query in some
human readable language,
such as SQL.
This query is passed to the Parser and Translator
Query Processing Steps:
Parser and Translator
Query Processing Steps:
Optimizer
Query Processing Steps:
Evaluation Engine
Cost of a Query
Disk Access:
How Relations are Stored on Disk
Three Factors of
Disk Access cost
Disk Access:
Why are Block Writes
more expensive than
Block Reads?
A block must be read again after writing,
so the actual time is:
(Time to Write) + (Time to Read)
Seek Time:
Basic Idea
The time spent
spinning the disk
to get to the specified data area
Seek Time:
Two types of Indices on Disk
Types of
External Memory Joins
(Extended) (9)
Three Types of
Join Algorithms (External Memory)
Nested Loop Joins
Hash Joins
Sort-Merge Joins
External Memory Join Algorithms:
Nested Loop Joins
General Cost
Nested Loop Joins
cost = O( |R| |S| )
External Memory Join Algorithms:
Hash Joins
General Cost
Hash Joins
Cost = O( |S| )
if |R| < |S|
*I believe R = read cost, S=search Cost
External Memory Join Algorithms:
Sort-Merge Joins
General Cost
Sort-Merge Joins
Cost = O( |S| * log(|s|) )
if |R| < |S|
*I believe R = read cost, S=search Cost
Query Costs:
External Memory Joins
Nested Loop Join:
Overview
For each block of a relation R, and for each tuple r in block:
For each block of a relation S, and for each tuple s in block:
Output rs if the join condition evaluates to true
Cost = B(R) +T(R)*B(S)
Memory: 4 Blocks, if double buffering
Block-Nest Loop Join:
Overview
For each block of R and for each block of S:
For each tuple r of R and s of S:
Output rs if join condition evaluates to true.
R is the outer table, S is the inner table
Cost:
B(R) + B(R)*B(S)
Memory:
4 Blocks (If double buffering)
Improved Nest Loop Join
Outer table R and inner table S
Cost:
B(R) + B(R)*B(S) / (M-2)
Memory:
M Blocks
Nested Join
Joining Outer table R and inner table S
(Used if B(R) + B(S) < M) (fits in memory)
Cost: B(R) + B(S)
Index Nested Join
R has an index over the Join Attribute
V(R,a) = number of distinct values of a in R
Cost:
Selection:
Two Types of Scans
+ Their Costs
Scanning Without Index:
Cost = B(R) + Seek Time
Scanning With Index:
If there is an index over the attribute, use a function F to determine the # of I/O operations
Data Clustered:
Cost = F(R,a)*B(R)
Data Unclustered:
Cost = F(R,a)*T(R)