Checkpoints vs WAL: The PostgreSQL Concept Most Developers Get Wrong

If you’ve worked with PostgreSQL long enough, you’ve probably heard this sentence before:

“PostgreSQL is safe because everything is written to WAL.”

It sounds reasonable.

It feels reassuring.

And it’s only half true.

This misunderstanding — confusing WAL with actual data persistence — is one of the most common PostgreSQL misconceptions.

It doesn’t usually break your database.

But it does lead to bad tuning decisions, mysterious performance issues, and very slow crash recovery.

Let’s clear it up once and for all.

Table of Contents

The Misunderstanding That Refuses to Die

Most developers believe something like this:

Transaction commits
→ Data is written to WAL
→ Data is safely on disk
→ Job done

So when they hear about checkpoints, the reaction is often:

“Why do we even need checkpoints? Isn’t WAL enough?”

This is where things quietly go wrong.

What WAL Actually Does (And What It Doesn’t)

WAL — Write-Ahead Logging — is not where your data lives.

WAL is a change log, not a database.

When PostgreSQL modifies a row, it does three different things, not one:

It updates the page in memory (shared_buffers)
It writes a description of the change to WAL
It returns success to the client

At commit time:

WAL is guaranteed to be flushed to disk
Data pages are NOT required to be written yet

That’s the key detail most people miss.

WAL answers one question only:

“If we crash, can we re-apply committed changes?”

It does not guarantee that the data files are already up to date.

So Where Is Your Data Right After COMMIT?

Right after a commit, your data might exist in three different places:

✅ WAL (on disk)
✅ shared_buffers (memory)
❌ data files (maybe not yet)

That’s intentional.

Writing data files is expensive.

PostgreSQL delays it on purpose to batch I/O efficiently.

And this is exactly where checkpoints enter the picture.

What Checkpoints Actually Do

A checkpoint is the moment PostgreSQL says:

“Let’s make the data files catch up with reality.”

During a checkpoint:

Dirty pages in memory are written to disk
PostgreSQL records a safe restart position
WAL before that point becomes irrelevant for recovery

After a checkpoint, PostgreSQL can honestly say:

“Everything before this point exists in real data files.”

Without checkpoints, PostgreSQL would still be correct —

but recovery would be a nightmare.

WAL Helps You Survive a Crash

Checkpoints Help You Recover Fast

Here’s the difference that matters in practice:

Without recent checkpoints

Crash happens
→ PostgreSQL must replay WAL from a very old position
→ Startup takes a long time
→ Business waits

With regular checkpoints

Crash happens
→ PostgreSQL starts from last checkpoint
→ Replays only recent WAL
→ Startup is fast

Both are correct.

Only one is operationally acceptable.

Why This Confusion Leads to Real Problems

This WAL vs checkpoint misunderstanding usually shows up as:

1. “We don’t need to tune checkpoints”

Then performance randomly drops every few minutes.

2. “Let’s reduce WAL size aggressively”

Suddenly checkpoints happen every 20 seconds.

3. “Recovery is slow, but hardware is fine”

The last checkpoint was 40 minutes ago.

None of these problems are obvious at first.

They all trace back to thinking WAL equals data persistence.

Why PostgreSQL Separates WAL and Checkpoints

This design is not accidental. It’s one of PostgreSQL’s strengths.

By separating:

logging changes (WAL)
from persisting data (checkpoints)

PostgreSQL gains:

High write throughput
Flexible tuning
Predictable recovery behavior

But only if you understand both sides.

A Simple Rule of Thumb

If you’re running PostgreSQL in production:

WAL tells you what changed
Checkpoints decide when it becomes permanent
Monitoring only one of them is incomplete

If you see WAL growth but never look at checkpoint behavior,

you’re flying blind.

How Checkpoints Work

The Checkpoint Process Flow

┌─────────────────────────────────────────────────────────────┐
│                    Checkpoint Triggered                      │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 1: Identify all dirty pages in shared_buffers         │
│          (Pages modified but not yet written to disk)       │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 2: Write dirty pages to data files                    │
│          • Spread over checkpoint_completion_target period  │
│          • Minimize I/O impact on normal operations         │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 3: fsync() to ensure data is on physical disk         │
│          (Not just in OS cache)                             │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 4: Write checkpoint record to WAL                     │
│          Records:  LSN, timestamp, redo location             │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 5: Update pg_control with checkpoint location         │
│          (Used during crash recovery)                       │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 6: Mark old WAL files for recycling/removal           │
│          WAL before checkpoint can be safely discarded      │
└─────────────────────────────────────────────────────────────┘

Timeline Example

Time:     9:00        9:05        9:10        9:15        9:20
         │           │           │           │           │
Data:    ──────────────────────────────────────────────────►
         │           │           │           │           │
WAL:     ═══════════════════════════════════════════════►
         │           │           │           │           │
CP:      ◆          ◆           ◆           ◆
         │           │           │           │
         └─ CP1      └─ CP2      └─ CP3      └─ CP4

Legend:
◆ = Checkpoint
─ = Normal operations
═ = WAL accumulation

If crash occurs at 9:17:
- Recovery starts from CP3 (9:10)
- Replays WAL from 9:10 to 9:17 (7 minutes)
- Does NOT replay from 9:00 (17 minutes)

Types of Checkpoints

1. Timed Checkpoints (checkpoints_timed) ✅

Trigger: Controlled by checkpoint_timeout parameter

Characteristics:

Predictable, scheduled occurrence
Smooth, spread-out I/O
Preferred for system health

Example:

-- checkpoint_timeout = 5min
09:00:00 → Timed checkpoint
09:05:00 → Timed checkpoint
09:10:00 → Timed checkpoint

2. Requested Checkpoints (checkpoints_req) ⚠️

Triggers:

WAL size reaches max_wal_size
Manual CHECKPOINT command
Database shutdown (normal modes)
Certain DDL operations (CREATE DATABASE, etc.)

Characteristics:

Forced, reactive response
Can cause I/O spikes
High frequency indicates tuning needed

Example:

-- max_wal_size = 1GB
-- Heavy write workload generates 1GB WAL in 2 minutes

09:00:00 → Timed checkpoint
09:02:00 → Requested checkpoint (WAL hit 1GB) ⚠️
09:04:00 → Requested checkpoint (WAL hit 1GB again) ⚠️
09:05:00 → Timed checkpoint

3. Special Checkpoints

End-of-recovery checkpoint: After crash recovery
Shutdown checkpoint: Before clean shutdown
Restartpoint (on standby): Similar to checkpoint on replicas

Checkpoint Configuration

Key Parameters

1. checkpoint_timeout

-- Default: 5min
-- Range: 30s to 1d
-- Recommended: 10min to 30min for most workloads

ALTER SYSTEM SET checkpoint_timeout = '15min';

Impact:

Lower value: More frequent checkpoints, faster recovery, higher I/O overhead
Higher value: Less I/O overhead, slower recovery, more WAL accumulation

2. max_wal_size

-- Default: 1GB
-- Recommended: 2GB to 16GB depending on write volume

ALTER SYSTEM SET max_wal_size = '4GB';

Impact:

Lower value: Frequent forced checkpoints, I/O spikes
Higher value: Smoother operation, longer recovery time, more disk usage

3. min_wal_size

-- Default: 80MB
-- Minimum WAL size to maintain between checkpoints

ALTER SYSTEM SET min_wal_size = '1GB';

Purpose: Pre-allocates WAL space to avoid overhead of creating new segments

4. checkpoint_completion_target

-- Default: 0.9 (90% of checkpoint_timeout)
-- Range: 0.0 to 1.0

ALTER SYSTEM SET checkpoint_completion_target = 0.9;

Impact: Spreads checkpoint I/O over this fraction of the interval

checkpoint_timeout = 10min
checkpoint_completion_target = 0.9

Checkpoint I/O spread over:  10min × 0.9 = 9 minutes
Reduces I/O spikes!

5. checkpoint_warning

-- Default: 30s
-- Logs warning if checkpoints happen closer than this

ALTER SYSTEM SET checkpoint_warning = '30s';

Log output when violated:

LOG:  checkpoints are occurring too frequently (24 seconds apart)
HINT:   Consider increasing the configuration parameter "max_wal_size"

6. log_checkpoints

-- Default: on (in PostgreSQL 15+)
-- Logs checkpoint statistics

ALTER SYSTEM SET log_checkpoints = on;

Example log:

LOG:  checkpoint starting: time
LOG:  checkpoint complete: wrote 16384 buffers (25.0%);
      0 WAL file(s) added, 3 removed, 5 recycled;
      write=25.789 s, sync=0.456 s, total=26.245 s;
      sync files=142, longest=0.234 s, average=0.003 s;
      distance=131072 kB, estimate=131072 kB

Final Thoughts

WAL is not the database.

Checkpoints are not optional housekeeping.

They are two halves of the same durability story.

Once you truly understand that difference, a lot of PostgreSQL behavior suddenly makes sense:

performance dips
recovery time
I/O spikes
tuning trade-offs

Checkpoints vs WAL: The PostgreSQL Concept Most Developers Get Wrong

The Misunderstanding That Refuses to Die

What WAL Actually Does (And What It Doesn’t)

So Where Is Your Data Right After COMMIT?

What Checkpoints Actually Do

WAL Helps You Survive a Crash

Checkpoints Help You Recover Fast

Without recent checkpoints

With regular checkpoints

Why This Confusion Leads to Real Problems

1. “We don’t need to tune checkpoints”

2. “Let’s reduce WAL size aggressively”

3. “Recovery is slow, but hardware is fine”

Why PostgreSQL Separates WAL and Checkpoints

A Simple Rule of Thumb

How Checkpoints Work

The Checkpoint Process Flow

Timeline Example

Types of Checkpoints

1. Timed Checkpoints (checkpoints_timed) ✅

2. Requested Checkpoints (checkpoints_req) ⚠️

3. Special Checkpoints

Checkpoint Configuration

Key Parameters

1. checkpoint_timeout

2. max_wal_size

3. min_wal_size

4. checkpoint_completion_target

5. checkpoint_warning

6. log_checkpoints

Final Thoughts

Further Reading

David Cao

Leave a ReplyCancel Reply

The Misunderstanding That Refuses to Die

What WAL Actually Does (And What It Doesn’t)

So Where Is Your Data Right After COMMIT?

What Checkpoints Actually Do

WAL Helps You Survive a Crash

Checkpoints Help You Recover Fast

Without recent checkpoints

With regular checkpoints

Why This Confusion Leads to Real Problems

1. “We don’t need to tune checkpoints”

2. “Let’s reduce WAL size aggressively”

3. “Recovery is slow, but hardware is fine”

Why PostgreSQL Separates WAL and Checkpoints

A Simple Rule of Thumb

How Checkpoints Work

The Checkpoint Process Flow

Timeline Example

Types of Checkpoints

1. Timed Checkpoints (checkpoints_timed) ✅

2. Requested Checkpoints (checkpoints_req) ⚠️

3. Special Checkpoints

Checkpoint Configuration

Key Parameters

1. checkpoint_timeout

2. max_wal_size

3. min_wal_size

4. checkpoint_completion_target

5. checkpoint_warning

6. log_checkpoints

Final Thoughts

Further Reading

David Cao

Related Posts

How to Create a Partition Table in PostgreSQL

3 Ways to count the number of rows in a table in PostgreSQL

Fixing “psql: error: connection to server on socket “/tmp/.s.PGSQL.5432″ failed: No such file or directory”

Leave a ReplyCancel Reply