BUG #19076: Generic query plan is extremely slow

Started by PG Bug reporting form9 months ago6 messagesbugs

noreply@postgresql.org

9 months ago

The following bug has been logged on the website:

Bug reference: 19076
Logged by: Premal Patel
Email address: premal.patel@veeva.com
PostgreSQL version: 17.6
Operating system: macOS 15.6.1 (observed on other Linux based OS)
Description:

I have an application which uses a database driver that creates prepared
statements. I am noticing that for some queries, the first 5 executions
after the database connection is created are performant. After this, the
prepared statement uses the generic query plan and is incredibly slow. This
is even if I do not change the parameters and simple re-run the same query
several times with the same arguments. I have tried re-running ANALYZE on
the tables and setting STATISTICS to 1000 on the rows in question with no
luck. Here is what my query looks like:

SELECT accounts_contacts.account_id, count(contacts.id) AS count
FROM accounts_contacts JOIN contacts ON contacts.id =
accounts_contacts.contact_id
WHERE accounts_contacts.account_id IN (...) AND contacts.tenant_id =
$1::UUID AND contacts.status = $2::VARCHAR
GROUP BY accounts_contacts.account_id

Here is the EXPLAIN output from the first 5 executions using the custom
query plan:

Finalize HashAggregate (cost=47034.52..47131.24 rows=9672 width=24) (actual
time=103.173..103.822 rows=1000 loops=1)
Group Key: accounts_contacts.account_id
Batches: 1 Memory Usage: 465kB
-> Gather (cost=44906.68..46937.80 rows=19344 width=24) (actual
time=102.801..103.506 rows=2326 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial HashAggregate (cost=43906.68..44003.40 rows=9672
width=24) (actual time=93.682..93.757 rows=775 loops=3)
Group Key: accounts_contacts.account_id
Batches: 1 Memory Usage: 465kB
Worker 0: Batches: 1 Memory Usage: 465kB
Worker 1: Batches: 1 Memory Usage: 465kB
-> Hash Join (cost=6066.79..43819.26 rows=17484
width=32) (actual time=12.902..92.674 rows=11509 loops=3)
Hash Cond: (contacts.id =
accounts_contacts.contact_id)
-> Parallel Seq Scan on contacts
(cost=0.00..36177.50 rows=373366 width=16) (actual time=0.435..55.023
rows=288122 loops=3)
Filter: ((tenant_id =
'0199729c-3f1e-712f-8b71-f6545a96b225'::uuid) AND ((status)::text =
'ACTIVE'::text))
Rows Removed by Filter: 162518
-> Hash (cost=5275.46..5275.46 rows=63307
width=32) (actual time=12.053..12.054 rows=37282 loops=3)
Buckets: 65536 Batches: 1 Memory Usage:
2843kB
-> Index Only Scan using
uq_accounts_contacts_account_id_contact_id on accounts_contacts
(cost=0.42..5275.46 rows=63307 width=32) (actual time=0.335..5.220
rows=37282 loops=3)
Index Cond: (account_id = ANY (...

Here is the EXPLAIN output from executions 6+ using the generic query plan:
GroupAggregate (cost=5821.21..5821.56 rows=20 width=24) (actual
time=7178.328..7179.934 rows=1000 loops=1)
Group Key: accounts_contacts.account_id
-> Sort (cost=5821.21..5821.26 rows=20 width=32) (actual
time=7178.314..7178.943 rows=34526 loops=1)
Sort Key: accounts_contacts.account_id
Sort Method: quicksort Memory: 2984kB
-> Nested Loop (cost=10.26..5820.77 rows=20 width=32) (actual
time=34.187..7174.937 rows=34526 loops=1)
-> Bitmap Heap Scan on contacts (cost=9.83..2490.09
rows=354 width=16) (actual time=34.071..193.216 rows=864365 loops=1)
Recheck Cond: (tenant_id = $1)
Filter: ((status)::text = ($2)::text)
Rows Removed by Filter: 102519
Heap Blocks: exact=26858
-> Bitmap Index Scan on ix_contacts_tenant_id
(cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114 rows=966884
loops=1)
Index Cond: (tenant_id = $1)
-> Index Scan using ix_accounts_contacts_contact_id on
accounts_contacts (cost=0.42..9.40 rows=1 width=32) (actual
time=0.008..0.008 rows=0 loops=864365)
Index Cond: (contact_id = contacts.id)
Filter: (account_id = ANY (ARRAY[...]))
Rows Removed by Filter: 1

(please note I omitted the account_ids list, but it is of length 1000 in
both cases)

I have seen others report the same issue, as seen in this forum:
https://dba.stackexchange.com/questions/289130/postgresql-changes-same-query-plan-at-runtime

Laurenz Albe

laurenz.albe@cybertec.at

9 months ago

In reply to: PG Bug reporting form (#1)

Re: BUG #19076: Generic query plan is extremely slow

On Tue, 2025-10-07 at 20:32 +0000, PG Bug reporting form wrote:

PostgreSQL version: 17.6
Operating system: macOS 15.6.1 (observed on other Linux based OS)

I have an application which uses a database driver that creates prepared
statements. I am noticing that for some queries, the first 5 executions
after the database connection is created are performant. After this, the
prepared statement uses the generic query plan and is incredibly slow. This
is even if I do not change the parameters and simple re-run the same query
several times with the same arguments. I have tried re-running ANALYZE on
the tables and setting STATISTICS to 1000 on the rows in question with no
luck. Here is what my query looks like:

SELECT accounts_contacts.account_id, count(contacts.id) AS count
FROM accounts_contacts JOIN contacts ON contacts.id =
accounts_contacts.contact_id
WHERE accounts_contacts.account_id IN (...) AND contacts.tenant_id =
$1::UUID AND contacts.status = $2::VARCHAR
GROUP BY accounts_contacts.account_id

Here is the EXPLAIN output from the first 5 executions using the custom
query plan:

[hash join]

Here is the EXPLAIN output from executions 6+ using the generic query plan:

[nested loop join with over 800000 iterations]

I don't think that is a bug. The logic that decides whether to use a
generic plan or not is a heuristic, and heuristics are bound to get it
wrong occasionally.

I'd guess that the first five executions happened to use values that
had few "contacts".

Either don't use a prepared statement for this statement, or make sure
that "plan_cache_mode" is set to "force_custom_plan" for that single
prepared statement.

Yours,
Laurenz Albe

David Rowley

dgrowleyml@gmail.com

9 months ago

In reply to: Laurenz Albe (#2)

Re: BUG #19076: Generic query plan is extremely slow

On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Either don't use a prepared statement for this statement, or make sure
that "plan_cache_mode" is set to "force_custom_plan" for that single
prepared statement.

That might be the best idea. Going by the following fragment, I'm
suspecting that it's a multi-tenanted system and the query is running
on a larger than average tenant. At best when planning the generic
plan the planner can only assume there's going to be <estimated rows
in table> divided by <number of distinct tenant_ids in table> matching
"tenant_id = $1".

-> Bitmap Index Scan on ix_contacts_tenant_id
(cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114
rows=966884 loops=1)
Index Cond: (tenant_id = $1)

In this case that estimates to be 709, but in reality, for this
tenant, it's 966884. That results in the parameterized nested loop
having to do over 1000x more inner scans than planned.

Another option would be to check if pg_stats reports that n_distinct
is roughly accurate for this table. (see if it's close to select
count(distinct tenant_id) from contacts;). If that's set too high then
lowering it might help. Artificially lowering it could also be an
option, but that could lead to bad plans in other areas, so the
plan_cache_mode idea might be better.

David

Premal Patel

premal.patel@veeva.com

9 months ago

In reply to: David Rowley (#3)

Re: BUG #19076: Generic query plan is extremely slow

I see, thank you both. This gives me a better understanding of where this originates from.

I just have one follow up question: I have noticed similar issues in a few other queries made in my application. Would it be unsafe to set “plan_cache_mode” to “force_custom_plan” for the entirety of the connection?

Show quoted text

On Oct 8, 2025, at 5:14 AM, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Either don't use a prepared statement for this statement, or make sure
that "plan_cache_mode" is set to "force_custom_plan" for that single
prepared statement.

That might be the best idea. Going by the following fragment, I'm
suspecting that it's a multi-tenanted system and the query is running
on a larger than average tenant. At best when planning the generic
plan the planner can only assume there's going to be <estimated rows
in table> divided by <number of distinct tenant_ids in table> matching
"tenant_id = $1".

-> Bitmap Index Scan on ix_contacts_tenant_id
(cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114
rows=966884 loops=1)
Index Cond: (tenant_id = $1)

In this case that estimates to be 709, but in reality, for this
tenant, it's 966884. That results in the parameterized nested loop
having to do over 1000x more inner scans than planned.

Another option would be to check if pg_stats reports that n_distinct
is roughly accurate for this table. (see if it's close to select
count(distinct tenant_id) from contacts;). If that's set too high then
lowering it might help. Artificially lowering it could also be an
option, but that could lead to bad plans in other areas, so the
plan_cache_mode idea might be better.

David

Laurenz Albe

laurenz.albe@cybertec.at

9 months ago

In reply to: Premal Patel (#4)

Re: BUG #19076: Generic query plan is extremely slow

On Wed, 2025-10-08 at 07:17 -0400, Premal Patel wrote:

Would it be unsafe to set “plan_cache_mode” to “force_custom_plan”
for the entirety of the connection?

No, that's not unsafe at all, but you'd lose the performance advantage
of reusing plans, which can be substantial for short statements.

Yours,
Laurenz Albe

Andrei Lepikhov

lepihov@gmail.com

9 months ago

In reply to: David Rowley (#3)

Re: BUG #19076: Generic query plan is extremely slow

On 8/10/2025 11:13, David Rowley wrote:

On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Either don't use a prepared statement for this statement, or make sure
that "plan_cache_mode" is set to "force_custom_plan" for that single
prepared statement.We already have a couple of proposals that would alleviate this

recurring issue.
The first one [1]/messages/by-id/19919494-92a8-4905-a250-6cf17b89f7c3@gmail.com introduces 'referenced' planning for generic plans.
There, the optimiser may use current values for incoming parameters to
predict estimations, rather than using magic constants like 0.3 and
0.005, which seem too conservative in many cases. To my knowledge, SQL
Server, Oracle, and some other systems employ this approach. Although
not the ideal solution, it is optional and may help people choose a
proper planning strategy.

The second feature [2]/messages/by-id/458ace73-4827-43e1-8a30-734a93d4720f@gmail.com is an addition to the first one. It slightly
changes the behaviour of plan_cache_mode: it prioritises the strategy
declared in the CachedPlanSource::cursor_options over plan_cache_mode.
It provides users with the option to select a specific query and set a
generic/custom/auto strategy to mitigate severe performance degradation.

With later changes in the master branch, these features open the door
for extension developers to introduce 'execution-based' metrics and
choose a plan type for a specific plan cache entry.

[1]: /messages/by-id/19919494-92a8-4905-a250-6cf17b89f7c3@gmail.com
/messages/by-id/19919494-92a8-4905-a250-6cf17b89f7c3@gmail.com
[2]: /messages/by-id/458ace73-4827-43e1-8a30-734a93d4720f@gmail.com
/messages/by-id/458ace73-4827-43e1-8a30-734a93d4720f@gmail.com

--
regards,
Andrei Lepikhov,
pgEdge