Why Is PostgreSQL Not Deleting WAL Files? A Friendly Guide

Ever had that sinking feeling when you see your server’s disk space slowly vanishing?

You investigate, and the trail leads you to a directory named pg_wal, filled with countless files. If this sounds familiar, you’re in the right place.

Before we dive in, let’s quickly talk about what these WAL files are. WAL stands for Write-Ahead Log. Think of it as a safety journal for your PostgreSQL database.

Every change you make is first written down in this journal before it’s applied to the main database files. This is a brilliant feature that protects your data from crashes and is essential for backups and replication.

But, for this journal to be useful, it needs to be managed properly.

When old entries (the WAL files) aren’t cleaned up, they can pile up and cause a major headache. So, let’s put on our detective hats and figure out why your pg_wal directory is overflowing.


Is Your Archiving Actually Working?

One of the most common reasons for WAL files sticking around is a problem with archiving. When you have archive_mode=on, PostgreSQL makes a promise: it won’t remove a WAL file until it has been safely copied to your archive storage.

If the copy command fails for any reason, Postgres will patiently hold onto that file and all the ones after it. This is a safety feature, but it can quickly fill up your disk.

How to Check:

You can peek into the archiver’s status with a simple SQL query:

See also: Mastering the Linux Command Line — Your Complete Free Training Guide

SELECT * FROM pg_stat_archiver;

Look at these columns:

  • failed_count: If this is greater than 0, you know there’s a problem.
  • last_failed_wal: This tells you exactly which file it’s struggling with.
  • last_archived_wal: This shows the last file that was successfully archived.

If the WAL files piling up are newer than the last_archived_wal, they might just be waiting their turn. But if you see a file stuck as the last_failed_wal, it’s time to investigate your archive_command.


Do You Have a Solid Backup Foundation?

Let’s move on to the next suspect. If you’re using a powerful backup tool like pgBackRest, it plays a big role in managing WAL files. A core concept here is the “full backup.” Without at least one successful full backup, PostgreSQL doesn’t have a safe starting point for recovery.

Think of it like this: the WAL files are the story of everything that happened after a certain point. If you don’t have that starting point (the full backup), the story is incomplete and not very useful. So, Postgres will hold onto the WAL files, waiting for that baseline to be established.

How to Check with pgBackRest:

You can quickly get a report on your backups using this command:

pgbackrest info

This will show you a list of your backups. If you don’t see at least one successful full backup, this could very well be the root of your problem.


Are Old Replication Slots Holding You Back?

Replication slots are a fantastic feature for ensuring that a standby server or a logical decoding client doesn’t miss any data changes. A replication slot tells the primary database, “Hey, I need you to hold onto all the WAL files until I can confirm I’ve received them.”

This is great for reliability, but it has a catch. If the consumer attached to that slot stops working, gets disconnected, or falls behind, the replication slot will dutifully keep all the necessary WAL files, potentially from a long, long time ago. This is one of the most common causes of uncontrolled pg_wal growth.

How to Check:

Run this SQL query to see all your replication slots:

SELECT slot_name, plugin, active, restart_lsn FROM pg_replication_slots;

The key column here is restart_lsn. This is the point in the WAL stream from where the slot will restart. If you see a restart_lsn that is very old, you’ve likely found your culprit. The WAL file containing that LSN (and all subsequent files) is being preserved just for that slot.


Could Your Backup Tool Be Holding On for a Reason?

Sometimes, the WAL files are being kept for a perfectly valid reason: your backup tool needs them. Tools like pgBackRest have retention policies that you define. For example, you might have configured it to keep enough WAL files to be able to restore your database to any point in time within the last 7 days.

If a WAL file is part of the history of a backup that is still within your retention window, pgBackRest will ensure it’s not deleted.

How to Check with pgBackRest:

Running the info command again is your best bet. It gives you a detailed view of which backups are being retained and the WAL ranges they require.

pgbackrest --stanza=yourstanza info

Check your repo-retention-full and other retention settings in your pgBackRest configuration to make sure they align with your expectations.


What’s Inside the WAL File Itself?

If you’re still stumped, it’s time to go straight to the source. You can actually inspect a WAL file to see the timestamps of the transactions it contains. This can give you a concrete idea of just how old it is and help you connect it to the clues you found above.

How to Check:

The pg_waldump utility is perfect for this. You can use it to peek inside a specific WAL file:

pg_waldump -p /path/to/your/pg_wal/ 00000001000000E20000009E | head

This command will show you information about the records in the file, including their timestamps. This can help you confirm if a file is being held by an old replication slot or if it’s simply waiting to be archived.

Putting It All Together

Troubleshooting can feel a bit overwhelming, so here’s a simple checklist to follow:

  1. Check the archiver status with pg_stat_archiver. Is it failing?
  2. Look for a full backup using a tool like pgbackrest info.
  3. Inspect your replication slots with pg_replication_slots. Is one lagging far behind?
  4. Analyze the WAL file directly with pg_waldump to understand its age.

By following these steps, you can systematically uncover why your WAL files aren’t being cleaned up and get your disk space back under control.

David Cao
David Cao

David is a Cloud & DevOps Enthusiast. He has years of experience as a Linux engineer. He had working experience in AMD, EMC. He likes Linux, Python, bash, and more. He is a technical blogger and a Software Engineer. He enjoys sharing his learning and contributing to open-source.

Articles: 546

One comment

  1. Thanks for the clear explanation! I ran into the same issue before—WAL files piling up because archiving wasn’t working correctly. Checking pg_stat_archiver helped me confirm where things were stuck. Really appreciate how you broke it down step by step—it makes troubleshooting a lot less intimidating!

Leave a Reply

Your email address will not be published. Required fields are marked *