CS 527
Home
B+ Tree
Bloom Filter
Hash Tables
Buffer Pool
Join Algorithms
External Sort
2PL & Locking
External Merge Sort
When data doesn't fit in memory, use a two-phase external merge sort. Phase 1 creates sorted runs; Phase 2 merges them using B buffer pages.
—
Phase
0
Runs
0
I/O Ops
Disk Pages (Input)
Buffer Pool (B pages)
B-1 input + 1 output
Sorted Runs (Phase 1 output)
Configure and run the sort. Phase 1 creates initial sorted runs. Phase 2 merges them.
Configuration
Pages (N)
8
12
16
Buffer (B)
3
4
5
Reset
Run Sort
▶ Phase 1
▶ Phase 2
Speed
Slow
Normal
Fast
Cost Formula
N pages, B buffer frames
Phase 1:
⌈N/B⌉ sorted runs
Cost: 2N I/Os
Phase 2:
Merge ⌈N/B⌉ runs at once
(if ⌈N/B⌉ ≤ B-1)
Cost: 2N I/Os
Total: 4N I/Os
(two-pass sort)
Algorithm
Pass 0 (create runs):
while pages remain:
read B pages into buffer
sort in-memory
write sorted run to disk
Pass 1 (merge):
load 1 page from each run
into B-1 input buffers
repeatedly: output min,
fetch next from that run