The WHERE Clause Myth: Why Your SQL Conditions Don't Control PostgreSQL Replication

Table of Contents

Introduction

One of the most confusing aspects of PostgreSQL’s logical replication is understanding the relationship between WHERE conditions in your SQL statements and REPLICA IDENTITY configuration. Many developers mistakenly believe that:

“If I use a WHERE clause with specific conditions, PostgreSQL will replicate only those conditions to the subscriber database.”

This is fundamentally incorrect, and it’s the source of many replica identity-related bugs and performance problems.

This article will clarify:

The critical difference between WHERE conditions and REPLICA IDENTITY
How logical replication really works under the hood
Why WHERE conditions don’t matter for replication
How REPLICA IDENTITY actually identifies rows on the subscriber

By the end of this article, you’ll have a complete understanding of how PostgreSQL determines which rows to replicate and why REPLICA IDENTITY is the true determining factor—not your WHERE clause.

Core Concepts

What is a WHERE Clause?

A WHERE clause is a SQL language construct that filters which rows your statement affects in the current database execution:

UPDATE orders
SET status = 'shipped'
WHERE customer_id = 5;

This tells PostgreSQL: “In my local database, find all rows where customer_id = 5, and update them.”

Scope: Local database only. Execution time: Immediate.

What is Replica Identity?

REPLICA IDENTITY is a PostgreSQL configuration setting that tells the logical replication system: “When this row is modified, use these columns to identify it for replication purposes.”

ALTER TABLE orders REPLICA IDENTITY DEFAULT;

This tells PostgreSQL: “When replicating changes to this table, use the primary key to identify which rows were changed.”

Scope: Replication system only. Execution time: When replication occurs (asynchronous).

The Fundamental Difference

WHERE CLAUSE
├─ Purpose: Filter rows for SQL statement execution
├─ Scope: Only affects the current database
├─ Execution:  Immediate, on the local server
└─ Replication Impact: ❌ NONE

REPLICA IDENTITY
├─ Purpose: Identify rows for logical replication
├─ Scope: Affects how changes are sent to subscribers
├─ Execution: When replication occurs (async)
└─ Replication Impact: ✅ CRITICAL

The Critical Distinction

Let me illustrate the most important concept in this article with a real example:

The Common Misconception

-- Publisher ()
UPDATE orders
SET status = 'shipped'
WHERE customer_id = 5;

-- Developer's WRONG assumption:
-- "The subscriber will receive:
--  UPDATE orders SET status = 'shipped' WHERE customer_id = 5;"

The Reality

-- Publisher ()
UPDATE orders
SET status = 'shipped'
WHERE customer_id = 5;

-- What ACTUALLY happens:
-- PostgreSQL finds all rows matching customer_id = 5
-- For EACH matching row, it extracts the REPLICA IDENTITY columns
-- It sends INDIVIDUAL UPDATE statements for each row,
-- using only the REPLICA IDENTITY columns (e.g., primary key)

-- If customer_id = 5 matches orders:  100, 101, 102, then subscriber receives:
UPDATE orders SET status = 'shipped' WHERE order_id = 100;
UPDATE orders SET status = 'shipped' WHERE order_id = 101;
UPDATE orders SET status = 'shipped' WHERE order_id = 102;

-- NOT the original WHERE clause!

This is the core insight you must understand.

How Replica Identity Really Works

Step-by-Step Process

Let me break down exactly what happens when you execute an UPDATE on the publisher:

Step 1: Execute the SQL Statement on Publisher

-- Publisher database
UPDATE orders
SET status = 'shipped'
WHERE customer_id = 5 AND order_date > '2024-01-01';

-- PostgreSQL uses the WHERE clause to identify affected rows:
-- Results:  order_id = 100, 101, 102, 103 (4 rows matched)

Step 2: Extract REPLICA IDENTITY for Each Row

Assuming REPLICA IDENTITY is set to DEFAULT (using primary key):

Row 1: order_id = 100 ← Extract this (primary key)
Row 2: order_id = 101 ← Extract this
Row 3: order_id = 102 ← Extract this
Row 4: order_id = 103 ← Extract this

The original WHERE clause (customer_id = 5 AND order_date > '2024-01-01') is discarded.

Step 3: Write to Write-Ahead Log (WAL)

PostgreSQL writes to the WAL (simplified representation):

{
  "operation": "UPDATE",
  "table": "orders",
  "tuples": [
    {"order_id": 100, "new_status": "shipped"},
    {"order_id": 101, "new_status": "shipped"},
    {"order_id": 102, "new_status": "shipped"},
    {"order_id": 103, "new_status": "shipped"}
  ]
}

Notice: **No WHERE clause is stored! ** Only the REPLICA IDENTITY columns.

Step 4: Logical Replication Reads from WAL

The logical replication system reads the WAL and prepares change messages for subscribers.

Step 5: Subscriber Receives Changes

The subscriber receives (simplified):

UPDATE orders SET status = 'shipped' WHERE order_id = 100;
UPDATE orders SET status = 'shipped' WHERE order_id = 101;
UPDATE orders SET status = 'shipped' WHERE order_id = 102;
UPDATE orders SET status = 'shipped' WHERE order_id = 103;

Step 6: Subscriber Applies Changes

The subscriber executes these UPDATE statements. It doesn’t matter what the original WHERE clause was. The subscriber uses only the REPLICA IDENTITY columns.

Visual Flow Diagram

┌─────────────────────────────────────────────────────────────┐
│ PUBLISHER DATABASE                                          │
│                                                             │
│ UPDATE orders SET status = 'shipped'                        │
│ WHERE customer_id = 5 AND order_date > '2024-01-01';       │
│                                                             │
│ ↓ WHERE clause finds 4 rows (order_id:  100,101,102,103)    │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ EXTRACT REPLICA IDENTITY (Default = Primary Key)            │
│                                                             │
│ For each matched row, extract only REPLICA IDENTITY:        │
│ Row 100 → {order_id: 100}                                  │
│ Row 101 → {order_id: 101}                                  │
│ Row 102 → {order_id: 102}                                  │
│ Row 103 → {order_id: 103}                                  │
│                                                             │
│ ❌ Original WHERE clause is DISCARDED                       │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ WRITE-AHEAD LOG (WAL)                                       │
│                                                             │
│ {operation: UPDATE, table: orders,                          │
│  tuples: [{order_id: 100,... }, {order_id:101,... }, ... ]}   │
│                                                             │
│ ❌ WHERE clause NOT stored                                  │
│ ✅ REPLICA IDENTITY columns stored                          │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ LOGICAL REPLICATION SENDS TO SUBSCRIBER                     │
│                                                             │
│ UPDATE orders SET status = 'shipped' WHERE order_id = 100;  │
│ UPDATE orders SET status = 'shipped' WHERE order_id = 101;  │
│ UPDATE orders SET status = 'shipped' WHERE order_id = 102;  │
│ UPDATE orders SET status = 'shipped' WHERE order_id = 103;  │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ SUBSCRIBER DATABASE                                         │
│                                                             │
│ ✅ Rows with order_id 100,101,102,103 are updated         │
│ ✅ Data is consistent with publisher                       │
└─────────────────────────────────────────────────────────────┘

Practical Examples

Example 1: WHERE Clause Using Non-Primary Key Column

Scenario: E-commerce order management system

-- ============= PUBLISHER SETUP =============

CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date TIMESTAMP,
    status VARCHAR(50),
    total_amount DECIMAL(10, 2)
);

-- Configure REPLICA IDENTITY with primary key
ALTER TABLE orders REPLICA IDENTITY DEFAULT;

-- ============= PUBLISHER EXECUTION =============

-- You execute this UPDATE
UPDATE orders
SET status = 'cancelled'
WHERE customer_id = 42;

-- This matches 5 rows: order_id = 1001, 1002, 1003, 1004, 1005

What gets stored in WAL:

[
  {operation: UPDATE, old_id: 1001, new_status: cancelled},
  {operation:  UPDATE, old_id: 1002, new_status: cancelled},
  {operation: UPDATE, old_id: 1003, new_status: cancelled},
  {operation: UPDATE, old_id:  1004, new_status:  cancelled},
  {operation: UPDATE, old_id: 1005, new_status: cancelled}
]

Notice: customer_id = 42 is NOT in the WAL. Only order_id (REPLICA IDENTITY).

What subscriber receives:

UPDATE orders SET status = 'cancelled' WHERE order_id = 1001;
UPDATE orders SET status = 'cancelled' WHERE order_id = 1002;
UPDATE orders SET status = 'cancelled' WHERE order_id = 1003;
UPDATE orders SET status = 'cancelled' WHERE order_id = 1004;
UPDATE orders SET status = 'cancelled' WHERE order_id = 1005;

Result: ✅ Replication succeeds because REPLICA IDENTITY (primary key) is correctly configured.

Example 2: Complex WHERE Clause (Multiple Conditions)

Scenario: Product price management system

-- ============= PUBLISHER SETUP =============

CREATE TABLE products (
    product_id BIGSERIAL PRIMARY KEY,
    category_id INTEGER,
    supplier_id INTEGER,
    price DECIMAL(10, 2),
    stock_quantity INTEGER,
    last_updated TIMESTAMP
);

ALTER TABLE products REPLICA IDENTITY DEFAULT;

-- ============= PUBLISHER EXECUTION =============

-- Complex WHERE clause with multiple conditions
UPDATE products
SET price = price * 1.1,
    last_updated = CURRENT_TIMESTAMP
WHERE category_id = 5
  AND supplier_id = 10
  AND stock_quantity > 100
  AND price < 100;

-- Matches 12 rows with product_id:  50, 51, 52, .. ., 61

What gets stored in WAL:

[
  {operation: UPDATE, old_id: 50, new_price:  .. ., new_last_updated: ...},
  {operation: UPDATE, old_id:  51, new_price: .. ., new_last_updated: ... },
  {operation: UPDATE, old_id: 52, new_price: ..., new_last_updated: ...},
  ...  (12 rows total)
]

*Complex WHERE clause is completely absent from WAL! **

What subscriber receives:

UPDATE products SET price = .. ., last_updated = ... WHERE product_id = 50;
UPDATE products SET price = ..., last_updated = ... WHERE product_id = 51;
UPDATE products SET price = ..., last_updated = ...  WHERE product_id = 52;
-- ... (12 total)

Result: ✅ Perfect replication. The WHERE clause complexity is irrelevant to replication.

Example 3: DELETE with Complex WHERE Clause

Scenario: Log cleanup system

-- ============= PUBLISHER SETUP =============

CREATE TABLE audit_logs (
    log_id BIGSERIAL PRIMARY KEY,
    user_id INTEGER,
    action VARCHAR(100),
    log_timestamp TIMESTAMP,
    ip_address INET
);

ALTER TABLE audit_logs REPLICA IDENTITY DEFAULT;

-- ============= PUBLISHER EXECUTION =============

-- Delete old logs for specific users
DELETE FROM audit_logs
WHERE user_id IN (SELECT user_id FROM deleted_users)
  AND log_timestamp < '2024-01-01'
  AND action IN ('LOGIN_FAILED', 'UNAUTHORIZED_ACCESS');

-- Matches 1,000 rows

What gets stored in WAL:

[
  {operation: DELETE, old_id: 100},
  {operation: DELETE, old_id: 145},
  {operation: DELETE, old_id: 201},
  ... (1,000 rows total, using only log_id)
]

What subscriber receives:

DELETE FROM audit_logs WHERE log_id = 100;
DELETE FROM audit_logs WHERE log_id = 145;
DELETE FROM audit_logs WHERE log_id = 201;
-- ... (1,000 total)

Result: ✅ All 1,000 rows are correctly deleted on subscriber.

Common Misconceptions

❌ Misconception 1: “I can use any WHERE clause and replication will work”

CREATE TABLE users (
    user_id UUID
    -- NO PRIMARY KEY!
);

ALTER TABLE users REPLICA IDENTITY NOTHING;  -- ❌ WRONG!

UPDATE users SET status = 'active' WHERE user_id = 'abc-123';

Problem: Even though you specified WHERE user_id = 'abc-123', replication fails because REPLICA IDENTITY is NOTHING.

Correct Solution:

ALTER TABLE users ADD PRIMARY KEY (user_id);
ALTER TABLE users REPLICA IDENTITY DEFAULT;  -- ✅ CORRECT

❌ Misconception 2: “WHERE clause becomes the REPLICA IDENTITY”

UPDATE orders SET status = 'shipped' WHERE customer_id = 5;

-- ❌ WRONG ASSUMPTION:
-- "The subscriber will know to update only customer_id = 5"

-- ✅ CORRECT:
// The subscriber will use the REPLICA IDENTITY (e.g., primary key)
// to identify each row, regardless of the WHERE clause

❌ Misconception 3: “Different WHERE clauses on different rows cause issues”

-- Scenario: Two UPDATE statements with different WHERE clauses

UPDATE orders SET status = 'shipped' WHERE customer_id = 5;
UPDATE orders SET status = 'processed' WHERE order_date > '2024-01-01';

// ❌ WRONG ASSUMPTION:
// "The subscriber might get confused about which WHERE clause applies"

// ✅ CORRECT:
// Each row is identified by REPLICA IDENTITY (primary key)
// The WHERE clauses are completely irrelevant to replication

✅ Correct Understanding

-- Publisher
UPDATE orders SET status = 'shipped' WHERE customer_id = 5;

-- This is understood as:
-- Step 1: Find all rows WHERE customer_id = 5 (LOCAL only)
// Step 2: For each matching row, extract its REPLICA IDENTITY
// Step 3: Send to subscriber using REPLICA IDENTITY, NOT the WHERE clause

Detailed Comparison Table

Comparison: WHERE Clause vs. REPLICA IDENTITY

Aspect	WHERE Clause	REPLICA IDENTITY
Purpose	Filter rows for SQL execution	Identify rows for replication
Scope	Local database only	Replication system only
Execution Timing	Immediate	Asynchronous (replication time)
Controls	Which rows are affected locally	How to find rows on subscriber
Affects Replication	❌ NO	✅ YES (critical)
Can Use Any Columns	✅ Yes	⚠️ No (must be unique, not null)
Stored in WAL	❌ No	✅ Yes
Sent to Subscriber	❌ No	✅ Yes
Example	`WHERE customer_id = 5`	Primary key, unique index
Performance Impact	Local query performance	Replication efficiency

REPLICA IDENTITY Options and WHERE Clause Compatibility

REPLICA IDENTITY	Allows WHERE Clause	UPDATE Replicates	DELETE Replicates	Notes
DEFAULT	✅ Any	✅ Yes	✅ Yes	Uses primary key, best performance
FULL	✅ Any	✅ Yes	✅ Yes	Uses all columns, poor performance
USING INDEX	✅ Any	✅ Yes	✅ Yes	Uses specified unique index
NOTHING	✅ Any	❌ No	❌ No	No row identification

Key Insight: The WHERE clause you use does NOT affect whether replication succeeds. Only REPLICA IDENTITY does.

Real-World Scenarios

Scenario 1: Multi-Tenant SaaS Platform

Problem: You need to update order status for a specific tenant, but you might use different WHERE conditions at different times.

-- ============= SETUP =============

CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    tenant_id INTEGER NOT NULL,
    customer_id INTEGER,
    status VARCHAR(50),
    created_at TIMESTAMP
);

ALTER TABLE orders REPLICA IDENTITY DEFAULT;

-- ============= DAY 1: Update by tenant =============

UPDATE orders SET status = 'processed' WHERE tenant_id = 100;
-- Matches 500 rows, all replicate correctly ✅

-- ============= DAY 2: Update by date =============

UPDATE orders SET status = 'archived' WHERE created_at < '2024-01-01';
// Matches 2,000 rows, all replicate correctly ✅

// ============= DAY 3: Update by customer =============

UPDATE orders SET status = 'on_hold' WHERE customer_id = 42;
// Matches 15 rows, all replicate correctly ✅

// ============= WHY DOES IT ALL WORK? =============

// Because REPLICA IDENTITY is set to DEFAULT (primary key)
// PostgreSQL extracts order_id for each matched row
// and sends that to the subscriber
// The WHERE clause variety doesn't matter!

Scenario 2: Inventory Management with Multiple Warehouses

Challenge: Different UPDATE statements with different WHERE clauses

-- ============= SETUP =============

CREATE TABLE inventory (
    inventory_id BIGSERIAL PRIMARY KEY,
    product_id BIGINT,
    warehouse_id INTEGER,
    quantity INTEGER,
    last_checked TIMESTAMP
);

ALTER TABLE inventory REPLICA IDENTITY DEFAULT;

-- ============= MULTIPLE UPDATE PATTERNS =============

-- Update inventory by warehouse
UPDATE inventory SET last_checked = NOW() WHERE warehouse_id = 1;

-- Update inventory for a product
UPDATE inventory SET quantity = quantity - 100 WHERE product_id = 999;

-- Update low stock items
UPDATE inventory SET quantity = 500 WHERE quantity < 50 AND warehouse_id = 2;

-- All of these work perfectly ✅
// Because REPLICA IDENTITY (primary key) is always used
// regardless of WHERE clause complexity

Scenario 3: Financial Transaction Processing

Requirement: Strict data consistency across distributed databases

-- ============= SETUP =============

CREATE TABLE transactions (
    txn_id BIGSERIAL PRIMARY KEY,
    account_id BIGINT NOT NULL,
    amount DECIMAL(19, 4),
    txn_status VARCHAR(50),
    timestamp TIMESTAMP
);

CREATE UNIQUE INDEX idx_account_timestamp
ON transactions(account_id, timestamp);

-- Using composite primary key for REPLICA IDENTITY
ALTER TABLE transactions REPLICA IDENTITY DEFAULT;

-- ============= COMPLEX BUSINESS LOGIC =============

-- Update transaction status by status value
UPDATE transactions SET txn_status = 'cleared'
WHERE txn_status = 'pending' AND timestamp < NOW() - INTERVAL '1 hour';

-- Update recent transactions
UPDATE transactions SET txn_status = 'failed'
WHERE txn_status = 'processing'
  AND timestamp > NOW() - INTERVAL '30 minutes'
  AND account_id IN (SELECT account_id FROM risk_accounts);

-- All replicates correctly!  ✅
// Why?  REPLICA IDENTITY (composite primary key) is used
// for row identification, not the complex WHERE clause

Performance Implications

Why WHERE Clause Type Doesn’t Affect Replication Performance

-- Both statements replicate with the SAME efficiency:

-- Scenario A: WHERE using primary key
UPDATE orders SET status = 'shipped' WHERE order_id = 100;
-- WAL size: Small (only order_id stored)
// Replication traffic: Minimal

-- Scenario B: WHERE using non-indexed column
UPDATE orders SET status = 'shipped'
WHERE customer_name = 'John Smith' AND email = '[email protected]';
// WAL size:  SAME size (still only order_id stored)
// Replication traffic:  SAME as Scenario A

// The WHERE clause complexity is IRRELEVANT to replication!

What DOES Affect Replication Performance: REPLICA IDENTITY Type

-- Scenario 1: REPLICA IDENTITY = DEFAULT (primary key)
ALTER TABLE orders REPLICA IDENTITY DEFAULT;

UPDATE orders SET status = 'shipped' WHERE customer_id = 5;
// WAL size: 50 bytes per row
// For 1,000 rows: ~50 KB
// Replication efficiency: ✅ Excellent

-- Scenario 2: REPLICA IDENTITY = FULL (all columns)
ALTER TABLE orders REPLICA IDENTITY FULL;

UPDATE orders SET status = 'shipped' WHERE customer_id = 5;
// WAL size: 500 bytes per row (all columns stored)
// For 1,000 rows: ~500 KB
// Replication efficiency: ❌ 10x worse

// The WHERE clause is the SAME,
// but REPLICA IDENTITY choice changes efficiency 10-fold!

Key Takeaway: It’s REPLICA IDENTITY configuration, not WHERE clause complexity, that affects replication performance.

Best Practices and Recommendations

✅ DO

Always configure REPLICA IDENTITY explicitly

-- ✅ Good
ALTER TABLE important_table REPLICA IDENTITY DEFAULT;

Use primary keys for efficient replication

-- ✅ Good
CREATE TABLE users (
    user_id BIGSERIAL PRIMARY KEY,
    ...
);

Use FULL only when absolutely necessary

-- ✅ Only if no primary key exists and you must replicate updates/deletes
ALTER TABLE legacy_table REPLICA IDENTITY FULL;

Document your REPLICA IDENTITY choices

-- ✅ Add comments explaining why
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    ...
);
-- Using DEFAULT replica identity for efficient logical replication

Monitor replication performance

-- ✅ Check WAL size and replication lag
SELECT slot_name, slot_type, confirmed_flush_lsn
FROM pg_replication_slots;

❌ DON’T

Don’t use NOTHING on tables you want to replicate

-- ❌ Bad - UPDATE/DELETE won't replicate
ALTER TABLE orders REPLICA IDENTITY NOTHING;

Don’t assume DEFAULT is set

-- ❌ Don't assume
UPDATE orders SET status = 'shipped' WHERE order_id = 100;

// ✅ Do verify
SELECT relname, relreplident FROM pg_class WHERE relname = 'orders';

Don’t use FULL for high-volume tables

-- ❌ Bad for high traffic
CREATE TABLE events (
    event_id BIGSERIAL,
    ...
);
ALTER TABLE events REPLICA IDENTITY FULL;

// ✅ Good - add primary key
ALTER TABLE events ADD PRIMARY KEY (event_id);
ALTER TABLE events REPLICA IDENTITY DEFAULT;

Don’t think WHERE clause affects replication

-- ❌ Wrong thinking
// "I'll use WHERE to control what gets replicated"

// ✅ Right thinking
// WHERE controls local execution, REPLICA IDENTITY controls replication

Don’t change REPLICA IDENTITY frequently

-- ❌ Bad practice
ALTER TABLE orders REPLICA IDENTITY DEFAULT;
-- ...  later
ALTER TABLE orders REPLICA IDENTITY FULL;
-- ... much later
ALTER TABLE orders REPLICA IDENTITY USING INDEX idx_order_number;

// ✅ Good practice - decide once and document it

Troubleshooting Guide

Problem 1: “UPDATE/DELETE not replicating”

Symptoms: Changes appear on publisher but not on subscriber

Diagnosis:

-- Check REPLICA IDENTITY
SELECT relname, relreplident FROM pg_class WHERE relname = 'your_table';

-- If output shows 'n' (NOTHING), that's your problem

Solution:

-- If table has primary key
ALTER TABLE your_table REPLICA IDENTITY DEFAULT;

-- If no primary key
ALTER TABLE your_table ADD PRIMARY KEY (unique_column);
ALTER TABLE your_table REPLICA IDENTITY DEFAULT;

-- If you can't add primary key
ALTER TABLE your_table REPLICA IDENTITY FULL;

Problem 2: “My WHERE clause works on publisher but not on subscriber”

Symptoms:

Publisher: UPDATE ... WHERE customer_id = 5; affects 100 rows
Subscriber: Still has old data for those rows

Root Cause: REPLICA IDENTITY not properly configured

Solution:

-- Verify WHERE clause matches correctly on BOTH sides
-- The issue is NOT the WHERE clause!
// It's the REPLICA IDENTITY

// Example:
// Publisher: UPDATE orders WHERE customer_id = 5;  // Finds 100 rows
// Problem:  REPLICA IDENTITY = NOTHING (can't replicate)
// Solution: Set REPLICA IDENTITY = DEFAULT or FULL

Problem 3: “WAL size exploding after setting REPLICA IDENTITY FULL”

Symptoms: WAL grows 10-100x larger than before

Root Cause: Using FULL replica identity stores all columns

Solution:

-- Step 1: Find tables using FULL
SELECT relname FROM pg_class WHERE relreplident = 'f';

-- Step 2: For each table, add primary key if missing
ALTER TABLE problem_table ADD COLUMN id BIGSERIAL PRIMARY KEY;

-- Step 3: Switch to DEFAULT
ALTER TABLE problem_table REPLICA IDENTITY DEFAULT;

-- Step 3: Monitor WAL size
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') / (1024*1024) as wal_size_mb;

Problem 4: “Different rows match WHERE clause on publisher and subscriber”

Cause: Data drift between publisher and subscriber

Diagnosis:

-- Compare row counts
-- Publisher
SELECT COUNT(*) FROM orders WHERE customer_id = 5;

-- Subscriber
SELECT COUNT(*) FROM orders WHERE customer_id = 5;

// If different, you have data sync issues

Solution:

-- Resync the subscription
ALTER SUBSCRIPTION my_sub DISABLE;
ALTER SUBSCRIPTION my_sub SET (copy_data = true);
ALTER SUBSCRIPTION my_sub ENABLE;

// Then re-verify REPLICA IDENTITY configuration
SELECT relname, relreplident FROM pg_class WHERE relname = 'orders';

Advanced Scenarios

Scenario 1: Replicating with REPLICA IDENTITY USING INDEX

Sometimes the primary key isn’t the best choice for identifying rows. You can use a different unique index:

-- ============= SETUP =============

CREATE TABLE products (
    product_id BIGSERIAL PRIMARY KEY,
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(200),
    price DECIMAL(10, 2)
);

-- Create unique index on SKU
CREATE UNIQUE INDEX idx_sku ON products(sku);

// Use SKU for REPLICA IDENTITY instead of product_id
ALTER TABLE products REPLICA IDENTITY USING INDEX idx_sku;

-- ============= EXECUTION =============

UPDATE products SET price = 99.99 WHERE category_id = 'ELECTRONICS';

-- What gets sent to subscriber:
// WHERE sku = 'SKU-001'
// WHERE sku = 'SKU-002'
// ...  etc

// ✅ Works perfectly!  Uses SKU to identify rows.

Scenario 2: Composite Primary Key

For tables with composite primary keys, REPLICA IDENTITY uses all key columns:

-- ============= SETUP =============

CREATE TABLE order_items (
    order_id BIGINT,
    line_number INTEGER,
    product_id BIGINT,
    quantity INTEGER,
    PRIMARY KEY (order_id, line_number)
);

ALTER TABLE order_items REPLICA IDENTITY DEFAULT;

// ============= EXECUTION =============

UPDATE order_items SET quantity = 5 WHERE product_id = 100;

// What gets sent to subscriber:
// WHERE order_id = 1001 AND line_number = 1
// WHERE order_id = 1001 AND line_number = 2
// WHERE order_id = 1002 AND line_number = 1
// ...  etc

// Uses BOTH composite key columns for identification

Scenario 3: Handling Tables Without Primary Keys

If you have legacy tables without primary keys:

-- ============= SETUP =============

CREATE TABLE legacy_logs (
    timestamp TIMESTAMP,
    event_type VARCHAR(100),
    details TEXT
    -- NO PRIMARY KEY!
);

// ============= OPTION 1: Add a primary key =============
ALTER TABLE legacy_logs ADD COLUMN log_id BIGSERIAL PRIMARY KEY;
ALTER TABLE legacy_logs REPLICA IDENTITY DEFAULT;

// ============= OPTION 2: Create unique index =============
CREATE UNIQUE INDEX idx_unique_log
ON legacy_logs(timestamp, event_type, md5(details));

ALTER TABLE legacy_logs REPLICA IDENTITY USING INDEX idx_unique_log;

// ============= OPTION 3: Use FULL (⚠️ performance cost) =============
ALTER TABLE legacy_logs REPLICA IDENTITY FULL;

// ============= OPTION 4: No replication for updates/deletes =============
ALTER TABLE legacy_logs REPLICA IDENTITY NOTHING;
// (Only INSERT will replicate)

Summary Table: WHERE Clause vs. REPLICA IDENTITY

SCENARIO:  UPDATE orders SET status = 'shipped' WHERE customer_id = 5;

┌─────────────────────────────────────────────────────────────┐
│ WHERE CLAUSE (customer_id = 5)                              │
│ ├─ Used:  ON PUBLISHER DATABASE                             │
│ ├─ Purpose: Find which rows to update locally              │
│ ├─ Example:  Finds rows with order_id 100,101,102,103       │
│ └─ Sent to subscriber: ❌ NO                               │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ REPLICA IDENTITY (e.g., PRIMARY KEY = order_id)            │
│ ├─ Used: BY REPLICATION SYSTEM                             │
│ ├─ Purpose: Identify rows on SUBSCRIBER                    │
│ ├─ Example: Extracts order_id 100,101,102,103              │
│ └─ Sent to subscriber: ✅ YES                              │
└─────────────────────────────────────────────────────────────┘

RESULT:  Subscriber receives
UPDATE orders SET status = 'shipped' WHERE order_id = 100;
UPDATE orders SET status = 'shipped' WHERE order_id = 101;
UPDATE orders SET status = 'shipped' WHERE order_id = 102;
UPDATE orders SET status = 'shipped' WHERE order_id = 103;

Conclusion

The Key Insight

WHERE clauses and REPLICA IDENTITY are completely independent concerns in PostgreSQL logical replication.

WHERE clause: Controls which rows are affected on the publisher database
REPLICA IDENTITY: Controls how rows are identified on the subscriber database

The Most Important Takeaway

╔════════════════════════════════════════════════════════════╗
║                                                            ║
║  The WHERE condition you use in your SQL statement         ║
║  has ZERO impact on logical replication success.            ║
║                                                            ║
║  REPLICA IDENTITY is the ONLY thing that matters          ║
║  for determining if UPDATE/DELETE operations replicate.   ║
║                                                            ║
╚════════════════════════════════════════════════════════════╝

Action Items for Your Database

Audit all replicated tables: SELECT relname, relreplident FROM pg_class WHERE relkind = 'r' AND relreplident != 'd';
For each table without DEFAULT replica identity, decide:
- Add a primary key → Use DEFAULT
- OR create a unique index → Use USING INDEX
- OR accept the performance cost → Use FULL
Document your REPLICA IDENTITY choices with comments in your DDL
Test your replication with various WHERE clause patterns to build confidence
Monitor replication lag and WAL size to ensure efficiency

References

PostgreSQL Official Documentation: Logical Replication
PostgreSQL Write-Ahead Log (WAL)
[PostgreSQL Replica Identity Configuration](https://www.postgresql.org/docs/current/sql-altertable. html)
Understanding Logical Decoding
Performance Impact of REPLICA IDENTITY FULL

Author’s Note: This article addresses one of the most commonly misunderstood concepts in PostgreSQL. If you have colleagues working with logical replication, sharing this article could save them from subtle replication bugs that are difficult to debug. The distinction between WHERE clauses and REPLICA IDENTITY is fundamental to getting replication right.

Introduction

Core Concepts

What is a WHERE Clause?

What is Replica Identity?

The Fundamental Difference

The Critical Distinction

The Common Misconception

The Reality

How Replica Identity Really Works

Step-by-Step Process

Step 1: Execute the SQL Statement on Publisher

Step 2: Extract REPLICA IDENTITY for Each Row

Step 3: Write to Write-Ahead Log (WAL)

Step 4: Logical Replication Reads from WAL

Step 5: Subscriber Receives Changes

Step 6: Subscriber Applies Changes

Visual Flow Diagram

Practical Examples

Example 1: WHERE Clause Using Non-Primary Key Column

Example 2: Complex WHERE Clause (Multiple Conditions)

Example 3: DELETE with Complex WHERE Clause

Common Misconceptions

❌ Misconception 1: “I can use any WHERE clause and replication will work”

❌ Misconception 2: “WHERE clause becomes the REPLICA IDENTITY”

❌ Misconception 3: “Different WHERE clauses on different rows cause issues”

✅ Correct Understanding

Detailed Comparison Table

Comparison: WHERE Clause vs. REPLICA IDENTITY

REPLICA IDENTITY Options and WHERE Clause Compatibility

Real-World Scenarios

Scenario 1: Multi-Tenant SaaS Platform

Scenario 2: Inventory Management with Multiple Warehouses

Scenario 3: Financial Transaction Processing

Performance Implications

Why WHERE Clause Type Doesn’t Affect Replication Performance

What DOES Affect Replication Performance: REPLICA IDENTITY Type

Best Practices and Recommendations

✅ DO

❌ DON’T

Troubleshooting Guide

Problem 1: “UPDATE/DELETE not replicating”

Problem 2: “My WHERE clause works on publisher but not on subscriber”

Problem 3: “WAL size exploding after setting REPLICA IDENTITY FULL”

Problem 4: “Different rows match WHERE clause on publisher and subscriber”

Advanced Scenarios

Scenario 1: Replicating with REPLICA IDENTITY USING INDEX

Scenario 2: Composite Primary Key

Scenario 3: Handling Tables Without Primary Keys

Summary Table: WHERE Clause vs. REPLICA IDENTITY

Conclusion

The Key Insight

The Most Important Takeaway

Action Items for Your Database

References

David Cao

Related Posts

How to Create a Partition Table in PostgreSQL

3 Ways to count the number of rows in a table in PostgreSQL

Fixing “psql: error: connection to server on socket “/tmp/.s.PGSQL.5432″ failed: No such file or directory”

Leave a ReplyCancel Reply