CS 527
Home
B+ Tree
Bloom Filter
Hash Tables
Buffer Pool
Join Algorithms
External Sort
2PL & Locking
Join Algorithms
Three strategies to compute R ⋈ S on a join attribute. Compare their I/O costs and access patterns.
Nested Loop Join
Sort-Merge Join
Hash Join
R (Outer) —
●
S (Inner) —
●
Output
Run join to see output
Naïve nested loop: for each tuple in R, scan all of S. Cost = |R| × |S| comparisons.
Cost Analysis (I/O pages)
|R| pages
—
|S| pages
—
Comparisons
0
Total I/O
P_R + P_R × P_S
Relations (R ⋈ S on id)
R tuples
4
5
6
S tuples
4
5
6
▶ Run Join
Reset
Speed
Delay
Slow
Normal
Fast
Algorithm
for
each r in R:
for
each s in S:
if
r.id == s.id:
emit(r, s)
Cost = P_R + |R| × P_S
(naïve: tuple-level)
Block NLJ:
Process B-2 pages of R at once
Cost = P_R + ⌈P_R/(B-2)⌉ × P_S
R (sorted) —
●
S (sorted) —
●
Output
Run join to see output
Sort-Merge: sort both relations, then merge with two pointers advancing in tandem.
Cost Analysis
Sort R
2 × P_R × (1 + ⌈log_{B-1}⌈P_R/B⌉⌉)
Sort S
2 × P_S × (1 + ⌈log_{B-1}⌈P_S/B⌉⌉)
Merge pass
P_R + P_S
Comparisons
0
Best when
already sorted or B large
Relations
▶ Run Join
Reset
Speed
Delay
Slow
Normal
Fast
Algorithm
Phase 1:
Sort R, Sort S
Phase 2 (Merge):
i=0, j=0
while
i<|R| and j<|S|:
if
R[i].id == S[j].id:
emit; advance both
elif
R[i].id < S[j].id:
i++
else
: j++
Best case:
P_R + P_S
(if data already sorted)
R (Build) —
●
S (Probe) —
●
Hash Table (built from R)
Not built yet
Output
Run join to see output
Hash Join: Build phase — hash all R tuples into a hash table. Probe phase — for each S tuple, probe the table.
Cost Analysis
Build (read R)
P_R
Probe (read S)
P_S
Total I/O
P_R + P_S
Probes done
0
Requires
B > √P_R buffer pages
Relations
▶ Run Join
Reset
Speed
Delay
Slow
Normal
Fast
Algorithm
Phase 1: Build
for
each r in R:
HT[h(r.id)].add(r)
Phase 2: Probe
for
each s in S:
for
r in HT[h(s.id)]:
if
r.id == s.id:
emit(r, s)
Total I/O = P_R + P_S
(if HT fits in memory)