For many new PostgreSQL administrators, there is a moment of pure confusion: you’ve just deleted millions of rows to save space, but the disk usage hasn’t budged. In fact, sometimes it keeps growing.
This phenomenon is known as Table Bloat, and it’s a direct consequence of how PostgreSQL handles multiple users at once.
To master Postgres, you have to stop thinking of a table as a static spreadsheet and start seeing it as a continuous log of events.
Table of Contents
Key takeaways
| Key Concept | What You Need to Know |
| The Cause | MVCC (Multiversion Concurrency Control) ensures reads and writes don’t block each other, but it leaves “dead” row versions on your disk. |
| The “Bloat” Effect | DELETE and UPDATE don’t actually erase data; they just mark it as invisible, meaning your disk usage stays high even after a purge. |
| Autovacuum | The background “Hero” that makes dead space reusable for new rows. Never disable this, or you risk a database-wide “Transaction ID Wraparound” crash. |
Understanding MVCC: The “Ghost” in the Machine
PostgreSQL uses a system called Multiversion Concurrency Control (MVCC). This ensures that if you are running a long report, your data stays consistent even if someone else is updating those same rows mid-query.
Instead of overwriting data, Postgres creates “versions” of rows. Here is how that looks on your disk:
| Action Taken | What Happens on Disk | Resulting Status |
| INSERT | A new row version (tuple) is written to a page. | ✅ Live: Visible to all new transactions. |
| DELETE | The row is simply “marked” as deleted (not erased). | 💀 Dead: Occupies space but is invisible to new queries. |
| UPDATE | The old row is marked “deleted” and a new version is written. | 💀 Dead + ✅ Live: You are now using double the space for one row. |
The Magic of Non-Blocking Queries
The reason Postgres accepts this “bloat” is for performance. By keeping those 💀 Dead rows around for a short time:
- Reads never block Writes: You can read a table while someone else is inserting data into it.
- Writes never block Reads: You can update a record without freezing the UI for everyone else.
How to Reclaim Your Space
Eventually, those dead rows need to be cleared out so the space can be reused.
Because the Operating System (Windows, Linux, or macOS) still thinks Postgres is using that space, you have to manage it through the VACUUM process.
1. The Silent Hero: AUTOVACUUM
By default, Postgres runs a background worker called autovacuum. It scans your tables for dead rows and marks that space as “available.”
Note: Autovacuum does not usually return space to the OS. It simply clears a “hole” in the table so the next
INSERTcan fill it, preventing the file from getting even larger.See also: Mastering the Linux Command Line — Your Complete Free Training Guide
2. The Manual Clean: VACUUM
If you’ve had a massive one-time data purge, you can trigger this manually. It works just like the automatic version but gives you more control over when the heavy lifting happens.
3. The Nuclear Option: VACUUM FULL
When a table is more “dead” than “live,” you might need VACUUM FULL. This command creates a brand-new table file, copies only the live rows into it, and deletes the old, bloated file.
- The Catch: It requires a “Full Access Lock.” No one can read or write to that table while it’s running.
| Method | How it Works | Best For… |
| Autovacuum | Background worker that marks dead space as “available” for new rows. | Routine maintenance. (Prevents growth but rarely shrinks files). |
| Manual VACUUM | A user-triggered version of the above. | Post-purge cleanup to prep for new data. |
| VACUUM FULL | Rebuilds the table from scratch, returning space to the OS. | The Nuclear Option. Requires a full table lock (no reads/writes). |
Step-by-Step: How to Detect Bloat in Your Tables
You shouldn’t guess if you have bloat; you should measure it. PostgreSQL provides two main ways to see the “ghosts” in your storage.
The Quick Check (Built-in)
You can see a high-level view of “dead tuples” (the rows marked for deletion but still on disk) using the pg_stat_user_tables view.
SQL
SELECT
schemaname,
relname AS table_name,
n_live_tup AS live_rows,
n_dead_tup AS dead_rows,
ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS bloat_ratio_pct,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC;
- When to worry: If your
bloat_ratio_pctis consistently over 20% on a large table, yourautovacuumsettings might be too conservative for your write volume.
The Deep Dive (Using pgstattuple)
For an exact measurement of wasted bytes, enable the pgstattuple extension. This is the “gold standard” for 2026 database auditing.
SQL
CREATE EXTENSION IF NOT EXISTS pgstattuple;
-- This gives you the physical bytes of "dead" space
SELECT * FROM pgstattuple('your_table_name');
Pro-Tip: “Good Bloat” vs. “Bad Bloat”
Believe it or not, you don’t always want 0% bloat.
- Good Bloat: On tables with frequent updates, having some “Free Space” (often managed via a setting called
fillfactor) allows Postgres to perform HOT (Heap Only Tuple) Updates. This keeps the new row version on the same page, preventing index bloat and making updates lightning-fast. - Bad Bloat: When dead rows accumulate across thousands of pages, forcing Postgres to scan 1GB of data to find 10MB of actual results. This is what kills your query performance.
Frequently Asked Questions (The 2026 Edition)
Q: Does TRUNCATE cause bloat?
A: No. Unlike DELETE, which marks rows as dead, TRUNCATE is a DDL command that instantly deallocates the data pages and returns the space to the Operating System. Use it for “scratch” or log tables you want to clear entirely.
Q: Can I just disable Autovacuum and run it manually at night?
A: In almost every 2026 production environment, the answer is no. Disabling it leads to “Transaction ID Wraparound” risks, which can force your entire database into read-only mode to prevent data loss. It is much better to tune autovacuum to be more aggressive during off-peak hours than to turn it off.
Q: Why does my Index have more bloat than my Table?
A: Indexes are often more sensitive to updates. Every time a row moves to a new page, every index on that table must be updated. Over time, indexes can become “fragmented.” If an index is twice the size of the table, a REINDEX CONCURRENTLY is usually the best fix.
- The PostgreSQL “Bloat” Trap: Why Your Database Grows Even When You Delete Data
- Why PostgreSQL is Different: 6 Architectural Ideas That Confuse Beginners
- Stop Manually Replying to Google Form Submissions — Automate It Instead
- The Hidden SSH Setting That Makes Ansible Playbooks Faster
- Stop Copy-Pasting Keys: Master SSH Config for One-Line Logins




