The PostgreSQL “Bloat” Trap: Why Your Database Grows Even When You Delete Data

For many new PostgreSQL administrators, there is a moment of pure confusion: you’ve just deleted millions of rows to save space, but the disk usage hasn’t budged. In fact, sometimes it keeps growing.

This phenomenon is known as Table Bloat, and it’s a direct consequence of how PostgreSQL handles multiple users at once.

To master Postgres, you have to stop thinking of a table as a static spreadsheet and start seeing it as a continuous log of events.

Key takeaways

Key ConceptWhat You Need to Know
The CauseMVCC (Multiversion Concurrency Control) ensures reads and writes don’t block each other, but it leaves “dead” row versions on your disk.
The “Bloat” EffectDELETE and UPDATE don’t actually erase data; they just mark it as invisible, meaning your disk usage stays high even after a purge.
AutovacuumThe background “Hero” that makes dead space reusable for new rows. Never disable this, or you risk a database-wide “Transaction ID Wraparound” crash.

Understanding MVCC: The “Ghost” in the Machine

PostgreSQL uses a system called Multiversion Concurrency Control (MVCC). This ensures that if you are running a long report, your data stays consistent even if someone else is updating those same rows mid-query.

Instead of overwriting data, Postgres creates “versions” of rows. Here is how that looks on your disk:

Action TakenWhat Happens on DiskResulting Status
INSERTA new row version (tuple) is written to a page.Live: Visible to all new transactions.
DELETEThe row is simply “marked” as deleted (not erased).💀 Dead: Occupies space but is invisible to new queries.
UPDATEThe old row is marked “deleted” and a new version is written.💀 Dead +Live: You are now using double the space for one row.

The Magic of Non-Blocking Queries

The reason Postgres accepts this “bloat” is for performance. By keeping those 💀 Dead rows around for a short time:

  • Reads never block Writes: You can read a table while someone else is inserting data into it.
  • Writes never block Reads: You can update a record without freezing the UI for everyone else.

How to Reclaim Your Space

Eventually, those dead rows need to be cleared out so the space can be reused.

Because the Operating System (Windows, Linux, or macOS) still thinks Postgres is using that space, you have to manage it through the VACUUM process.

1. The Silent Hero: AUTOVACUUM

By default, Postgres runs a background worker called autovacuum. It scans your tables for dead rows and marks that space as “available.”

Note: Autovacuum does not usually return space to the OS. It simply clears a “hole” in the table so the next INSERT can fill it, preventing the file from getting even larger.

See also: Mastering the Linux Command Line — Your Complete Free Training Guide

2. The Manual Clean: VACUUM

If you’ve had a massive one-time data purge, you can trigger this manually. It works just like the automatic version but gives you more control over when the heavy lifting happens.

3. The Nuclear Option: VACUUM FULL

When a table is more “dead” than “live,” you might need VACUUM FULL. This command creates a brand-new table file, copies only the live rows into it, and deletes the old, bloated file.

  • The Catch: It requires a “Full Access Lock.” No one can read or write to that table while it’s running.
MethodHow it WorksBest For…
AutovacuumBackground worker that marks dead space as “available” for new rows.Routine maintenance. (Prevents growth but rarely shrinks files).
Manual VACUUMA user-triggered version of the above.Post-purge cleanup to prep for new data.
VACUUM FULLRebuilds the table from scratch, returning space to the OS.The Nuclear Option. Requires a full table lock (no reads/writes).

Step-by-Step: How to Detect Bloat in Your Tables

You shouldn’t guess if you have bloat; you should measure it. PostgreSQL provides two main ways to see the “ghosts” in your storage.

The Quick Check (Built-in)

You can see a high-level view of “dead tuples” (the rows marked for deletion but still on disk) using the pg_stat_user_tables view.

SQL

SELECT 
    schemaname, 
    relname AS table_name, 
    n_live_tup AS live_rows, 
    n_dead_tup AS dead_rows,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS bloat_ratio_pct,
    last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC;
  • When to worry: If your bloat_ratio_pct is consistently over 20% on a large table, your autovacuum settings might be too conservative for your write volume.

The Deep Dive (Using pgstattuple)

For an exact measurement of wasted bytes, enable the pgstattuple extension. This is the “gold standard” for 2026 database auditing.

SQL

CREATE EXTENSION IF NOT EXISTS pgstattuple;

-- This gives you the physical bytes of "dead" space
SELECT * FROM pgstattuple('your_table_name');

Pro-Tip: “Good Bloat” vs. “Bad Bloat”

Believe it or not, you don’t always want 0% bloat.

  • Good Bloat: On tables with frequent updates, having some “Free Space” (often managed via a setting called fillfactor) allows Postgres to perform HOT (Heap Only Tuple) Updates. This keeps the new row version on the same page, preventing index bloat and making updates lightning-fast.
  • Bad Bloat: When dead rows accumulate across thousands of pages, forcing Postgres to scan 1GB of data to find 10MB of actual results. This is what kills your query performance.

Frequently Asked Questions (The 2026 Edition)

Q: Does TRUNCATE cause bloat?

A: No. Unlike DELETE, which marks rows as dead, TRUNCATE is a DDL command that instantly deallocates the data pages and returns the space to the Operating System. Use it for “scratch” or log tables you want to clear entirely.

Q: Can I just disable Autovacuum and run it manually at night?

A: In almost every 2026 production environment, the answer is no. Disabling it leads to “Transaction ID Wraparound” risks, which can force your entire database into read-only mode to prevent data loss. It is much better to tune autovacuum to be more aggressive during off-peak hours than to turn it off.

Q: Why does my Index have more bloat than my Table?

A: Indexes are often more sensitive to updates. Every time a row moves to a new page, every index on that table must be updated. Over time, indexes can become “fragmented.” If an index is twice the size of the table, a REINDEX CONCURRENTLY is usually the best fix.

David Cao
David Cao

David is a Cloud & DevOps Enthusiast. He has years of experience as a Linux engineer. He had working experience in AMD, EMC. He likes Linux, Python, bash, and more. He is a technical blogger and a Software Engineer. He enjoys sharing his learning and contributing to open-source.

Articles: 607

Leave a Reply

Your email address will not be published. Required fields are marked *