effective_cache_size in PostgreSQL: A Practical Deep Dive

Table of Contents

Core Highlights & Actionable Takeaways

Parameter Essence: effective_cache_size is not a memory allocation setting—it tells the PostgreSQL query optimizer the total system cache capacity (PostgreSQL shared buffers + OS filesystem cache + CPU caches).
Cost Impact: It adjusts the relative cost of index scans. Higher values make index scans cheaper (vs. sequential scans), pushing the optimizer to favor indexing.
Invalid Scenarios: The parameter has no effect when table data has perfect physical correlation (sorted data matching index order) or when tables contain only one column.
Tuning Formula: For dedicated PostgreSQL servers, use effective_cache_size = Total RAM × 0.7–0.8 as a starting point.
Key Caveat: It influences execution plan selection only—it does not reserve or allocate memory.

1. What the PostgreSQL Optimizer Does

SQL’s core simplicity hides a complex decision-making process: when a user submits a query, the query optimizer must find the most efficient way to execute it. The output of this process is an execution plan.

The optimizer’s choice boils down to a cost model: it assigns a numerical cost to every possible operation (e.g., index scans, sequential scans, joins). The goal is to select the plan with the lowest total cost.

The magic of query performance tuning lies in how the optimizer calculates these costs—and that is exactly where effective_cache_size comes into play.

2. Understanding effective_cache_size

A critical optimizer decision is choosing between an index scan and a sequential scan. A common misconception is: “Why not always use an index?”

Index traversal is not free, and index scans often require additional lookups to fetch full table rows (a “bookmark lookup”). The optimizer must weigh the overhead of index operations against the time saved by avoiding full-table scans.

A key variable in this cost calculation is I/O cost: reading data from cache is orders of magnitude cheaper than reading from disk. effective_cache_size quantifies the total cache available to PostgreSQL—including:

PostgreSQL’s shared buffers
The operating system’s filesystem cache
CPU caches

This parameter does not allocate memory. Instead, it provides the optimizer with a hint about how much data is likely to be cached, allowing it to make more accurate cost estimates.

3. effective_cache_size in Action (Practical Example)

To see the parameter’s impact, we create two tables with identical data—but one is sorted by the indexed column, and the other is randomized. We then compare execution plans at different effective_cache_size values.

Step 1: Create Test Data

-- Create a randomized table
CREATE TABLE t_random AS
SELECT id, random() AS r
FROM generate_series(1, 1000000) AS id
ORDER BY random();

-- Create a sorted table
CREATE TABLE t_ordered AS
SELECT id, random() AS r
FROM generate_series(1, 1000000) AS id;

-- Add indexes to both tables
CREATE INDEX idx_random ON t_random (id);
CREATE INDEX idx_ordered ON t_ordered (id);

-- Update table statistics for accurate optimization
VACUUM ANALYZE;

Step 2: Disable Bitmap Scans (For Clarity)

PostgreSQL defaults to bitmap index scans for many workloads, but we disable this to isolate the impact of effective_cache_size on standard index scans:

SET enable_bitmapscan TO off;

Step 3: Test with a Small Cache Value

We set effective_cache_size to an unrealistically low value (1 MB), simulating a system with almost no cache:

SET effective_cache_size TO '1 MB';
EXPLAIN SELECT * FROM t_random WHERE id < 1000;

Execution Plan:

Index Scan using idx_random on t_random (cost=0.42..3611.96 rows=909 width=12)
  Index Cond: (id < 1000)

Step 4: Test with a Large Cache Value

We then set the parameter to an extremely high value (1000 GB), simulating a system with abundant cache:

SET effective_cache_size TO '1000 GB';
EXPLAIN SELECT * FROM t_random WHERE id < 1000;

Execution Plan:

Index Scan using idx_random on t_random (cost=0.42..3383.99 rows=909 width=12)
  Index Cond: (id < 1000)

Key Observation

The cost of the index scan decreases when effective_cache_size increases. The absolute cost value does not matter—what matters is its relative cost compared to a sequential scan.

When index scan costs drop relative to sequential scan costs, the optimizer will choose index scans more frequently. This is the core function of effective_cache_size: it biases the optimizer toward indexing when cache is abundant.

4. Limitations and Special Cases

effective_cache_size is not a one-size-fits-all solution—it has no impact in two specific scenarios.

Scenario 1: Perfect Data Correlation

PostgreSQL’s table statistics track physical correlation: how well the table’s on-disk sort order matches the index order. A correlation value of 1 means the data is perfectly sorted.

In this case, index scans already have minimal I/O overhead, so adjusting effective_cache_size does nothing:

-- Small cache value
SET effective_cache_size TO '1 MB';
EXPLAIN SELECT * FROM t_ordered WHERE id < 1000;

-- Large cache value
SET effective_cache_size TO '1000 GB';
EXPLAIN SELECT * FROM t_ordered WHERE id < 1000;

Result: Both plans have an identical cost of 0.42..39.17.

Scenario 2: Single-Column Tables

If a table has only one column (the indexed column), index scans require no additional table lookups. The I/O cost is already minimized, so effective_cache_size has no effect:

ALTER TABLE t_random DROP COLUMN r;

-- Test with small and large cache values—costs remain identical
SET effective_cache_size TO '1 MB';
EXPLAIN SELECT * FROM t_ordered WHERE id < 1000;

SET effective_cache_size TO '1000 GB';
EXPLAIN SELECT * FROM t_ordered WHERE id < 1000;

5. Tuning effective_cache_size in PostgreSQL

For dedicated PostgreSQL servers, use this simple formula to set a baseline value:

effective_cache_size = Total System RAM × 0.7–0.8

Use 0.7 for systems with other memory-intensive processes (e.g., monitoring tools).
Use 0.8 for pure PostgreSQL servers with no competing workloads.

Example: For a server with 64 GB of RAM, set effective_cache_size = 51 GB (64 × 0.8).

For shared servers (PostgreSQL running alongside other applications), reduce the multiplier to 0.5–0.6 to account for memory used by other processes.

Always test changes with your actual workload—monitor execution plans and query latency to validate the setting.

effective_cache_size in PostgreSQL: A Practical Deep Dive

Core Highlights & Actionable Takeaways

1. What the PostgreSQL Optimizer Does

2. Understanding effective_cache_size

3. effective_cache_size in Action (Practical Example)

Step 1: Create Test Data

Step 2: Disable Bitmap Scans (For Clarity)

Step 3: Test with a Small Cache Value

Step 4: Test with a Large Cache Value

Key Observation

4. Limitations and Special Cases

Scenario 1: Perfect Data Correlation

Scenario 2: Single-Column Tables

5. Tuning effective_cache_size in PostgreSQL

David Cao

Leave a ReplyCancel Reply

Core Highlights & Actionable Takeaways

1. What the PostgreSQL Optimizer Does

2. Understanding effective_cache_size

3. effective_cache_size in Action (Practical Example)

Step 1: Create Test Data

Step 2: Disable Bitmap Scans (For Clarity)

Step 3: Test with a Small Cache Value

Step 4: Test with a Large Cache Value

Key Observation

4. Limitations and Special Cases

Scenario 1: Perfect Data Correlation

Scenario 2: Single-Column Tables

5. Tuning effective_cache_size in PostgreSQL

David Cao

Related Posts

How to Create a Partition Table in PostgreSQL

3 Ways to count the number of rows in a table in PostgreSQL

Fixing “psql: error: connection to server on socket “/tmp/.s.PGSQL.5432″ failed: No such file or directory”

Leave a ReplyCancel Reply