My Journey Understanding an Inherited PostgreSQL System

Why Your Database Ignores Indexes: A Journey Through Planner Mysteries and Design Choices

Aug 19, 2025

The Puzzle That Started It All

In a recent project I worked on, I was puzzled by what seemed like an inconsistent indexing situation. This inconsistency bothered me, so I decided to dig deeper. What I discovered taught me that "wrong" isn’t always wrong. It is often about trade-offs I hadn’t considered.

The Starting Point: A Confusing Primary Key

The first thing that caught my attention was the primary key design:

CREATE TABLE locations (
    location_id TEXT NOT NULL,      -- e.g., 'A01020304' (hierarchical location code)
    warehouse_id TEXT NOT NULL,     -- e.g., '25' 
    -- ... other columns
    CONSTRAINT pk_locations_location_id_warehouse_id 
        PRIMARY KEY (location_id, warehouse_id)
);

This struck me as backwards. In most multi-tenant systems I've worked with, you'd cluster by warehouse_id first for tenant isolation. But here, location_id comes first. I wondered if this was a mistake or if there was reasoning I wasn't seeing.

Understanding the Location ID Structure

As I dug deeper, I discovered that these location IDs weren't just random strings. They followed a specific hierarchical structure that was key to understanding the whole system. Each warehouse had thousands (sometimes millions) of storage locations, and every location was encoded into a structured string ID.

For example, a location ID like A01020304 breaks down as:

Aisle: A01 (an uppercase letter + 2 digits)
Bay: 02 (2 digits)
Level: 03 (2 digits)
Position: 04 (2 digits)

When someone searches for A01%, they're looking for all locations in aisle A01 across any warehouse. When they search for H1%, they want all locations in aisles starting with H1. This spatial organization is crucial for warehouse operations. workers need to find adjacent locations, analyze storage density by aisle, or audit entire sections of the warehouse.

My First Breakthrough: The Migration History

Curious about the primary key design, I dug through the database migration history. What I found surprised me:

Original Design (October 2023)

PRIMARY KEY (warehouse_id, location_id)  -- Warehouse first

Changed to (February 2025)

PRIMARY KEY (location_id, warehouse_id)  -- Location first

So it wasn't a mistake! The previous team had deliberately switched the primary key ordering. This made me even more curious. What drove that decision? And more importantly, I started noticing something concerning in my testing: PostgreSQL kept avoiding this index, even when it seemed like it would be faster.

My Testing Adventure: When the Planner Surprised Me

This is where my investigation got really interesting. I decided to systematically test different query patterns to understand when PostgreSQL was making good choices versus poor ones. What I found challenged everything I thought I knew about query optimization.

Test 1: Location Pattern Query

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT location_id, warehouse_id
FROM locations
WHERE location_id LIKE 'H1%';

Result: 477ms with parallel sequential scan for 92,689 rows (~4% of table)

PostgreSQL chose sequential scan, which makes sense for large result sets.

Test 2: Warehouse-Specific Query

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT location_id, warehouse_id
FROM locations
WHERE warehouse_id = '25';

Natural planner choice: 567ms (sequential scan) Forced index usage: 282ms (50% faster!)

SET enable_seqscan = false;
-- Same query runs in 282ms with bitmap index scan
RESET enable_seqscan;

Test 3: Combined Query

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT location_id, warehouse_id
FROM locations
WHERE warehouse_id = '25' AND location_id LIKE 'H1000%';

Natural planner choice: 501ms (sequential scan) Forced index usage: 354ms (29% faster)

Test 4: LIMIT with OFFSET Queries

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT location_id, warehouse_id
FROM locations
WHERE location_id LIKE 'H1%' AND warehouse_id = '09' 
LIMIT 100 OFFSET 100;

Natural planner choice: 436ms (sequential scan) Forced index usage: 204ms (53% faster)

-- Natural planner result (sequential scan)
Limit  (cost=6560.89..12121.78 rows=100 width=12) (actual time=412.845..436.060 rows=0 loops=1)
  ->  Gather
        ->  Parallel Seq Scan on locations
              Filter: (((location_id)::text ~~ 'H1%'::text) AND ((warehouse_id)::text = '09'::text))
              Rows Removed by Filter: 746758

-- Forced index result  
Limit  (cost=9743.66..19486.90 rows=100 width=12) (actual time=204.298..204.300 rows=0 loops=1)
  ->  Index Only Scan using pk_locations_location_id_warehouse_id on locations
        Index Cond: (warehouse_id = '09'::text)
        Filter: ((location_id)::text ~~ 'H1%'::text)
        Rows Removed by Filter: 30265

Key insight: Even with LIMIT + OFFSET, the index scan won significantly. The index could filter by warehouse_id first (30k rows examined) vs. sequential scan examining 746k rows across workers.

Test 5: Simple LIMIT Queries (When the Planner Gets It Right)

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT location_id, warehouse_id
FROM locations
WHERE location_id LIKE 'H1%' 
LIMIT 100;

Natural planner choice: 0.579ms (sequential scan) Forced index usage: 321ms (550x slower!)

-- Sequential scan result
Limit  (cost=0.00..91.49 rows=100 width=12) (actual time=0.035..0.553 rows=100 loops=1)
  ->  Seq Scan on locations
        Filter: ((location_id)::text ~~ 'H1%'::text)
        Rows Removed by Filter: 3060
        
-- Forced index scan result  
Limit  (cost=0.43..224.78 rows=100 width=12) (actual time=321.386..321.501 rows=100 loops=1)
  ->  Index Only Scan using pk_locations_location_id_warehouse_id on locations
        Filter: ((location_id)::text ~~ 'H1%'::text)
        Rows Removed by Filter: 433527

Here, the planner was absolutely correct! Sequential scan found 100 results after examining only 3,160 rows, while the index scan had to examine 433,627 rows to find the same 100 results.

Why LIMIT changes everything:

Sequential scan can stop as soon as it finds enough rows
H1% patterns are distributed throughout the data
Index scan must traverse scattered index pages to find matches
For small result sets, "first 100 found" beats "optimally ordered access"

Summary: When Indexes Help vs. Hurt

The Pattern:

Large scans: Sequential wins or is reasonable
Medium selectivity with filtering: Index wins significantly (even with LIMIT + OFFSET)
Simple LIMIT queries: Sequential wins dramatically (can stop early)

Understanding Why PostgreSQL Made These Choices

The more I tested, the more I realized there was a pattern to the planner's behavior. PostgreSQL's cost-based optimizer sometimes chose suboptimal plans, but sometimes it was exactly right. I needed to understand the logic behind these decisions.

1. The Cost Model Mystery

I discovered that PostgreSQL uses abstract "cost units" to compare execution strategies. Looking at my system's configuration, I found these parameters:

-- My system's cost parameters (defaults)
SHOW random_page_cost;        -- 4.0 (random reads cost 4x sequential!)
SHOW effective_cache_size;    -- 1502984kB (~1.5GB in my case)
seq_page_cost = 1.0          -- Cost to read a page sequentially  
cpu_tuple_cost = 0.01        -- Cost to process one row

For my warehouse query (warehouse_id = '25' with 6.5% selectivity), the planner calculated:

Sequential scan cost: ~67,000 units (read entire table sequentially, low per-page cost)
Index scan cost: ~130,000 units (random index + heap access, 4x page cost penalty)

This explained why PostgreSQL kept choosing sequential scans! The planner assumed random I/O was 4x more expensive than sequential I/O. But when I forced the index usage, it was actually 50% faster. The cost model seemed to be wrong about modern storage characteristics.

2. The Index Scattering Problem

I realized that with the (location_id, warehouse_id) primary key structure I inherited:

All warehouse '25' locations were scattered throughout the index
To find them, the index scan had to jump between many different location_id ranges (A01..., B01..., H1...)
This created the "random" access pattern that the cost model was heavily penalizing

3. Statistics vs. Reality

Interestingly, PostgreSQL's row count estimates were quite accurate (144k estimated vs 145k actual rows). The problem wasn't with statistics - it was with cost estimation. The planner didn't account for:

How efficiently modern SSDs handle the "random" access patterns
How much of the data was already cached in the 1.5GB buffer pool
How well bitmap heap scans actually perform on this hardware

My Analysis: Was This Bad Design?

The big question that kept bugging me: When the planner consistently chooses slower execution plans, does that mean the database design is wrong?

My Conclusion: It's All About Trade-offs

As I analyzed the system deeper, I realized the primary key design created a fundamental trade-off:

Location-first ordering (location_id, warehouse_id) optimized for:
- Global operations spanning all warehouses.
- Spatial analysis across the entire location hierarchy
But it penalized warehouse-specific queries because:
- Warehouse data became scattered throughout the index
- This forced "random" access patterns that the cost model heavily penalized

Understanding the Previous Team's Logic

Looking back at the migration history, I realized this was a completely deliberate decision:

October 2023: PRIMARY KEY (warehouse_id, location_id) (conventional multi-tenant design)

February 2025: PRIMARY KEY (location_id, warehouse_id) (location-first optimization)

The team had switched from the "obvious" warehouse-first design to location-first. This told me:

Cross-warehouse location queries were more critical to the business than warehouse isolation
They had probably measured the performance impact and decided the trade-off was worth it
The 50% penalty for warehouse queries was acceptable given their workload priorities

Issue with the query planner: Outdated cost assumptions

The more I studied this, the more I realized the problem wasn't the index design - it was PostgreSQL's cost model being outdated for modern hardware:

The cost model assumed random I/O was 4x more expensive than sequential
Modern SSDs and the 1.5GB buffer pool made this assumption much less relevant
Bitmap heap scans performed better than the planner estimated on this hardware

What I Learned and How I'd Optimize This System

1. Query Planner "Mistakes" Are Often Normal

My investigation taught me that query planners making suboptimal choices is more common than I expected, especially:

On modern hardware (SSDs, large RAM)
With index designs that optimize for specific business patterns
When cost models haven't kept up with hardware evolution

2. The Power of Testing Assumptions

I developed a methodical approach to test when the planner might be wrong, but learned to be careful with LIMIT queries:

-- My testing approach for the warehouse queries
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM locations WHERE warehouse_id = '25';

-- Force index usage to compare
SET enable_seqscan = false;
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM locations WHERE warehouse_id = '25';
RESET enable_seqscan;

-- But I learned LIMIT queries often favor sequential scans
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM locations WHERE location_id LIKE 'H1%' LIMIT 100;

3. How I'd Optimize: Supplemental Indexes

Based on my analysis, if I were optimizing this system, I'd add a supporting index to get the best of both worlds:

-- Add this to optimize warehouse-specific queries
CREATE INDEX CONCURRENTLY idx_locations_warehouse_location 
ON locations (warehouse_id, location_id);

This would preserve the benefits of the location-first primary key while eliminating the 50% performance penalty for warehouse queries.

4. Cost Model Tuning Experiments I'd Try

I realized this system would benefit from tuning PostgreSQL's cost model to reflect modern hardware realities:

-- Current settings that explain the planner behavior
SHOW random_page_cost;        -- 4.0 (assumes spinning disks!)
SHOW effective_cache_size;    -- 1502984kB (~1.5GB in our case)

-- Experiments I'd try:
SET random_page_cost = 1.1;   -- Reflect SSD performance characteristics
SET effective_cache_size = '4GB';  -- If more memory is available

Lowering random_page_cost would make index scans more attractive to the planner, since it would reduce the penalty for the "random" I/O operations that modern SSDs handle efficiently.

References supporting these tuning approaches:

PostgreSQL Wiki: Tuning Your PostgreSQL Server - Recommends random_page_cost = 1.1 for SSDs
PostgreSQL Documentation: Runtime Configuration - Official guidance on cost parameters
PostgreSQL Performance Tuning Guide - SSD optimization recommendations

My Key Discoveries

Query planners aren't perfect, but they're not always wrong - PostgreSQL made poor choices for most of my tests but excellent choices for simple LIMIT queries
Context matters enormously - Simple LIMIT queries favoured sequential scans (0.5ms vs 321ms), but LIMIT + OFFSET + filtering favoured indexes (204ms vs 436ms)
Index design creates trade-offs - the (location_id, warehouse_id) design optimized cross-warehouse patterns but penalized warehouse isolation
Cost models lag hardware - PostgreSQL's assumptions about I/O costs didn't match modern SSD and RAM characteristics in my testing
Testing assumptions is crucial - when queries seemed slow, testing alternative execution strategies with SET enable_seqscan = false revealed significant performance gaps
LIMIT queries favour sequential scans - when I only needed the first N results, sequential access often won dramatically

My Bottom Line: This inherited system wasn't "wrong" - PostgreSQL's query planner was being conservative based on outdated I/O cost assumptions. The primary key design reflected deliberate business optimization choices, and the performance gaps I observed could be addressed with supplemental indexes if needed.

Testing performed on PostgreSQL 15+ with ~2.24M rows. Results may vary based on hardware, configuration, and data distribution.