How to Automate Tasks in Postgres Using pg_cron (Step-by-Step Guide)

If you are managing a PostgreSQL database, you likely have routine tasks that need to run on a schedule. Whether it is refreshing materialized views, deleting old session data, archiving logs, or running routine VACUUM operations, doing these things manually is inefficient and prone to human error.

Enter pg_cron.

Developed by Citus Data (now part of Microsoft), pg_cron is a simple, cron-based job scheduler that runs directly inside PostgreSQL as an extension. It uses the same syntax as standard Linux cron, making it incredibly intuitive for system administrators and database engineers.

Here is a comprehensive, step-by-step guide to installing, configuring, and using pg_cron to automate your database tasks.


Step 1: Install the pg_cron Extension

Before you can use pg_cron inside your database, the physical binaries must be installed on your database server.

If you are using a managed database service (like Amazon RDS, Supabase, or Azure Database for PostgreSQL), pg_cron is usually pre-installed. You can skip directly to Step 2.

If you are hosting your own PostgreSQL instance on a Linux server (e.g., Ubuntu/Debian), install it via your package manager. Replace 15 with your specific PostgreSQL version:

Bash

sudo apt-get update
sudo apt-get install postgresql-15-cron

For RHEL/CentOS systems:

Bash

See also: Mastering the Linux Command Line — Your Complete Free Training Guide

sudo yum install pg_cron_15

Step 2: Configure PostgreSQL

Because pg_cron relies on a background worker process to wake up and execute tasks, it must be loaded when PostgreSQL starts.

  1. Open your postgresql.conf file.
  2. Add pg_cron to the shared_preload_libraries parameter.
  3. Define the database where pg_cron will keep its metadata (usually the default postgres database).

Ini, TOML

# Edit postgresql.conf
shared_preload_libraries = 'pg_cron'
cron.database_name = 'postgres'

Once you have saved the configuration file, restart the PostgreSQL service to apply the changes:

Bash

sudo systemctl restart postgresql

Step 3: Enable the Extension

Now that the server is configured, log into your target database (the one you specified in cron.database_name) using psql or your preferred SQL client and create the extension:

SQL

CREATE EXTENSION pg_cron;

Note: You must have superuser privileges to create the extension.


Step 4: Schedule Your First Job

pg_cron uses the cron.schedule() function to create jobs. It takes two primary arguments: the cron schedule string and the SQL command to execute.

The Cron Syntax Refresher

pg_cron uses standard cron syntax, which consists of five fields:

  1. Minute (0-59)
  2. Hour (0-23)
  3. Day of the month (1-31)
  4. Month (1-12)
  5. Day of the week (0-7, where 0 and 7 are Sunday)

Example 1: Refreshing a Materialized View Daily

Let’s say you have a materialized view called daily_sales_report that you want to refresh every night at 2:00 AM.

SQL

SELECT cron.schedule(
    'refresh_sales_view',   -- Optional: A unique job name
    '0 2 * * *',            -- Cron schedule: 2:00 AM every day
    'REFRESH MATERIALIZED VIEW daily_sales_report;' -- The SQL command
);

Example 2: Purging Old Data Hourly

If you have a table storing temporary user sessions and you want to delete rows older than 24 hours at the top of every hour:

SQL

SELECT cron.schedule(
    '0 * * * *', 
    $$DELETE FROM user_sessions WHERE created_at < NOW() - INTERVAL '1 day';$$
);

Tip: Using dollar quoting ($$) is helpful if your SQL command contains single quotes.


Step 5: Monitor and Manage Scheduled Jobs

Once you have scheduled a few jobs, you will want to track them. pg_cron provides a few handy tables to view your jobs and their execution history.

View Active Jobs

To see all currently scheduled jobs, query the cron.job table:

SQL

SELECT jobid, jobname, schedule, command, active 
FROM cron.job;

View Job History and Logs

To check if a job succeeded, failed, or how long it took to run, query the cron.job_run_details table:

SQL

SELECT jobid, runid, status, return_message, start_time, end_time 
FROM cron.job_run_details 
ORDER BY start_time DESC 
LIMIT 10;

If a job fails, the return_message column will typically contain the PostgreSQL error code, making debugging straightforward.


Step 6: Update or Unschedule a Job

If you need to stop a job from running, you can unschedule it using its jobid or jobname.

By Job Name:

SQL

SELECT cron.unschedule('refresh_sales_view');

By Job ID:

(First, find the ID from cron.job, then run the unschedule command).

SQL

SELECT cron.unschedule(42); 

To update an existing job, it is usually easiest to unschedule it and then schedule it again with the new parameters. Alternatively, advanced users can directly UPDATE the cron.job table, though using the provided functions is safer.


Important Best Practices

  • Timezones: By default, pg_cron uses GMT. If you want it to use your server’s local timezone, you need to add cron.timezone = 'America/New_York' (or your respective timezone) to your postgresql.conf file and restart Postgres.
  • Security: By default, jobs run as the user who scheduled them. If you need a job to run as a different user, you can specify this in the cron.job table, provided you have the necessary privileges.
  • Maintenance: The cron.job_run_details table can grow quite large if you have jobs running every minute. It is good practice to schedule a pg_cron job to periodically clean up the pg_cron history!

SQL

SELECT cron.schedule('clean_cron_history', '0 3 * * *', $$DELETE FROM cron.job_run_details WHERE start_time < now() - interval '7 days'$$);
David Cao
David Cao

David is a Cloud & DevOps Enthusiast. He has years of experience as a Linux engineer. He had working experience in AMD, EMC. He likes Linux, Python, bash, and more. He is a technical blogger and a Software Engineer. He enjoys sharing his learning and contributing to open-source.

Articles: 646

Leave a Reply

Your email address will not be published. Required fields are marked *