Query planner instability on large append-only table with heavy same-day inserts

Started by Giriraj Sharma6 months ago3 messagesgeneral

giriraj.sharma27@gmail.com

6 months ago

Environment

PostgreSQL 14 / 15 tested.
RDS / Aurora Postgres in production.
Primary key index (effective_date, idempotency_id).
Autovacuum enabled.
SQLC (Go) for query generation.
________________________________
Context

We are running into query planner instability on an append-only table that
records daily commit information.
The workload is heavily skewed toward the current day’s data, and query
performance deteriorates periodically until we run ANALYZE.

________________________________
Table Definition

CREATE TABLE sample_table (
idempotency_id transactions.string NOT NULL,
effective_date date NOT NULL,
created_at timestamptz DEFAULT now() NOT NULL,
CONSTRAINT commits_v3_pk PRIMARY KEY (effective_date, idempotency_id)
);

________________________________
Query Pattern

All our lookups are parameterized (SQLC, Go):

SELECT idempotency_id
FROM sample_table
WHERE effective_date = @effective_date
AND idempotency_id IN (
SELECT unnest(@idempotency_ids::transactions.string[])
);

Typical characteristics:

@effective_date almost always = current date.
@idempotency_ids varies between 2–50 items off-peak and 800–1000 items at
peak hours.
Query is latency-sensitive (expected 5–10 ms).

________________________________
Workload Characteristics

~10–12 million inserts per weekday, almost entirely for the current date.

Practically append-only — no updates or deletes.
No weekend inserts.
Occasional rare inserts for past or future dates (late or early trades).
Retention: ~3 years of data (~1000 days × 10 M = 10+ billion rows).
PostgreSQL 14+.

________________________________
Observed Behavior

Immediately after ANALYZE, this query uses an Index Only Scan on the
primary key (effective_date, idempotency_id) with stable latency around
5–10 ms.

After several days (5–7 typically), the query planner flips to a Sequential
Scan, and latency jumps to 2–30 seconds.

Running ANALYZE transactions.commits_v3; restores performance instantly.

We currently run manual ANALYZE twice a day via pg_cron, which helps but
doesn’t fully guarantee stability (especially in test environments where
insert patterns are more random).

________________________________
What We’ve Tried

Manual and scheduled ANALYZE runs (twice a day, each run takes ~30s) →
improves performance but not sustainable long-term.
Verified query execution plans before and after ANALYZE — planner switches
from Index Only Scan to Seq Scan as statistics become stale.
Confirmed table’s autovacuum is running (last one occurred 15 days ago),
but its frequency isn’t sufficient to keep stats current during high insert
periods. Could this be an issue at all given that we do run ANALYZE
at-least twice ?

_______________________________
Problem Summary

The planner’s row-count estimates for effective_date and idempotency_id
become inaccurate as we continuously append to “today’s” date.
The result is plan instability (index scan ↔ sequential scan) until
statistics are refreshed.
We’re looking for a solution that keeps plans stable without manual ANALYZE
as data volume scales.

________________________________

Questions for the Community

1. Partitioning

Would daily range partitioning by effective_date (≈ 1000 partitions for 3
years) be the right long-term approach here?

Given that inserts and queries almost always target today’s partition, will
partition pruning and per-partition statistics fully eliminate the
stale-statistics problem? Are there known performance implications of
maintaining ~1000 daily partitions at this scale (10 M/day)? We
occasionally receive backdated or future-dated inserts — can such
out-of-range values be handled efficiently (e.g., by creating partitions on
the fly)?

2. Autovacuum / Analyze Tuning

If we stay with a single table, what are practical per-table autovacuum
settings to ensure frequent ANALYZE even as total row count grows into
billions?

Would it make sense to use:

ALTER TABLE transactions.commits_v3
SET (
autovacuum_analyze_scale_factor = 0,
autovacuum_analyze_threshold = 50000,
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_vacuum_threshold = 10000
);

to decouple analyze frequency from table size?

Should we also experiment with raising the statistics target for
effective_date from 100 to 1000 using:

ALTER TABLE transactions.commits_v3
ALTER COLUMN effective_date SET STATISTICS 1000;

to improve the planner’s histogram accuracy for the date distribution?

3. Best Practices

Are there best practices or proven patterns for append-only,
time-series–like workloads that insert heavily into one day and read from
the same day?

Is there a known best way to make Postgres’s planner more resilient to
temporary statistic drift for parameterized queries like ours?

________________________________

Goal

We’d like a “set it and forget it” architecture — either through
partitioning or robust autovacuum tuning — where:
The planner always chooses the index scan for same-day queries.
We no longer need manual ANALYZE (that runs via pg_cron).
Query latency remains in the 5–10 ms range, even as total data volume grows
into billions of rows.

________________________________

Any recommendations, benchmark references, or production-proven strategies
for this workload would be highly appreciated.

Giriraj Sharma
about.me/girirajsharma

Laurenz Albe

laurenz.albe@cybertec.at

6 months ago

In reply to: Giriraj Sharma (#1)

Re: Query planner instability on large append-only table with heavy same-day inserts

On Tue, 2025-10-28 at 12:06 +0530, Giriraj Sharma wrote:

We are running into query planner instability on an append-only table that records daily commit information.
The workload is heavily skewed toward the current day’s data, and query performance deteriorates periodically until we run ANALYZE.

The best approach is to run autoanalyze often enough, either by reducing
autovacuum_analyze_scale_factor for the table or (as you suggested) by
setting the parameter to 0 and using only autovacuum_analyze_threshold.

There is no way to force a certain plan in PostgreSQL, unless you are
using the pg_hint_plan extension.

Yours,
Laurenz Albe

Greg Sabino Mullane

greg@turnstep.com

6 months ago

In reply to: Giriraj Sharma (#1)

Re: Query planner instability on large append-only table with heavy same-day inserts

On Tue, Oct 28, 2025 at 2:36 AM Giriraj Sharma <giriraj.sharma27@gmail.com>
wrote:

Would daily range partitioning by effective_date (≈ 1000 partitions for 3
years) be the right long-term approach here?

If you are querying based on dates, yes. Does not need to be daily, could
do monthly for example. A retention policy of 3 years is a great candidate
for partitioning.

Given that inserts and queries almost always target today’s partition,
will partition pruning and per-partition statistics fully eliminate the
stale-statistics problem? Are there known performance implications of
maintaining ~1000 daily partitions at this scale (10 M/day)? We
occasionally receive backdated or future-dated inserts — can such
out-of-range values be handled efficiently (e.g., by creating partitions on
the fly)?

It will help, don't know about eliminate. I would not do daily unless it is
really needed, that's a lot of partitions. Monthly to start with. You
cannot create partitions on the fly, but you can have them go to a default
partition and sort them out later. Or pre-create a bunch of partitions.

If we stay with a single table, what are practical per-table autovacuum

settings to ensure frequent ANALYZE even as total row count grows into
billions?

Turn off autovacuum for that table. Use cron to run vacuum and analyze
hourly (or some frequency that ensures good plans). If analyze is taking
too long, it can be done per-column as well, although that won't help much
if your sample table is representative. But if you have a lot of other
columns with stable values, you could analyze those less often. Measure and
see.

Should we also experiment with raising the statistics target for

effective_date from 100 to 1000 using:

I don't think that will matter if your analyze is already giving you
index-only scans.

Are there best practices or proven patterns for append-only,

time-series–like workloads that insert heavily into one day and read from
the same day?

Partitioning. Ensure fillfactor is 100%. Minimal indexes.

Is there a known best way to make Postgres’s planner more resilient to

temporary statistic drift for parameterized queries like ours?

Not really, other than giving Postgres updated stats via frequent analyze.

PostgreSQL 14 / 15 tested.

Moving to a newer version is always helpful too.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support