Join Algorithms

Nested Loop Join

Sort-Merge Join

Hash Join

R (Outer) — ●

S (Inner) — ●

Output

Run join to see output

Naïve nested loop: for each tuple in R, scan all of S. Cost = |R| × |S| comparisons.

Cost Analysis (I/O pages)

|R| pages—

|S| pages—

Comparisons0

Total I/OP_R + P_R × P_S

Relations (R ⋈ S on id)

R tuples

S tuples

Speed

Delay

Algorithm

for each r in R:
  for each s in S:
    if r.id == s.id:
      emit(r, s)

Cost = P_R + |R| × P_S
(naïve: tuple-level)

Block NLJ:
Process B-2 pages of R at once
Cost = P_R + ⌈P_R/(B-2)⌉ × P_S

R (sorted) — ●

S (sorted) — ●

Output

Run join to see output

Sort-Merge: sort both relations, then merge with two pointers advancing in tandem.

Cost Analysis

Sort R2 × P_R × (1 + ⌈log_{B-1}⌈P_R/B⌉⌉)

Sort S2 × P_S × (1 + ⌈log_{B-1}⌈P_S/B⌉⌉)

Merge passP_R + P_S

Comparisons0

Best whenalready sorted or B large

Relations

Speed

Delay

Algorithm

Phase 1: Sort R, Sort S

Phase 2 (Merge):
i=0, j=0
while i<|R| and j<|S|:
  if R[i].id == S[j].id:
    emit; advance both
  elif R[i].id < S[j].id:
    i++
  else: j++

Best case: P_R + P_S
(if data already sorted)

R (Build) — ●

S (Probe) — ●

Hash Table (built from R)

Not built yet

Output

Run join to see output

Hash Join: Build phase — hash all R tuples into a hash table. Probe phase — for each S tuple, probe the table.

Cost Analysis

Build (read R)P_R

Probe (read S)P_S

Total I/OP_R + P_S

Probes done0

RequiresB > √P_R buffer pages

Relations

Speed

Delay

Algorithm

Phase 1: Build
for each r in R:
  HT[h(r.id)].add(r)

Phase 2: Probe
for each s in S:
  for r in HT[h(s.id)]:
    if r.id == s.id:
      emit(r, s)

Total I/O = P_R + P_S
(if HT fits in memory)