Postgres: Queries are too slow after upgrading to PG17 from PG15

Started by Sajith Prabhakar Shetty11 months ago54 messagesbugs
Jump to latest
#1Sajith Prabhakar Shetty
ssajith@blackduck.com

Hi,

Most of the queries got slower after upgrading our postgres from version 15 to 17 using pg_upgrade. I reconfirmed that "vacuum full, analyze" were all taken care.

To debug, instead of upgrade, I installed two instances one with postgres 15 and another postgres 17 with the same application dump restored.

Now surprisingly one of the query i took from application which used to execute in 2s in PG15, is now taking 1min+ in PG17. I also observed that some of the operations involving DML operations slowed down too in PG17.

Explain plan of the two queries almost same, all the joins and paths used are exactly same.

Could anybody please provide some insights here?

PG15 Plan:

https://explain.depesz.com/s/5PGX

________________________________

PG17 Plan:

https://explain.depesz.com/s/27vD

Update: I installed PG16 with same dump, and verified that everything seems normal here in fact better than PG15, so i just want rule out possibility of PG16 impact here.

PG16 plan: https://explain.depesz.com/s/v0Gc

Sajith P Shetty
Principal Engineer
Black Duck
M +91 9448389989<tel:+919448389989>| ssajith@blackduck.com<mailto:ssajith@blackduck.com>
[signature_778616162]

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
#2Andrei Lepikhov
lepihov@gmail.com
In reply to: Sajith Prabhakar Shetty (#1)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On 15/5/2025 07:33, Sajith Prabhakar Shetty wrote:

Hi,

Most of the queries got slower after upgrading our postgres from version
15 to 17 using pg_upgrade. I reconfirmed that "vacuum full, analyze"
were all taken care.

To debug, instead of upgrade, I installed two instances one with
postgres 15 and another postgres 17 with the same application dump restored.

Now surprisingly one of the query i took from application which used to
execute in 2s in PG15, is now taking 1min+ in PG17. I also observed that
some of the operations involving DML operations slowed down too in PG17.

Explain plan of the two queries almost same, all the joins and paths
used are exactly same.

Could anybody please provide some insights here?

Curious, the difference in Index Only Scan node:

-> Index Scan using stream_file_pkey on stream_file sf
(cost=1.63..1.86 rows=1 width=8)
(actual time=0.006..0.006 rows=1 loops=598916)
Index Cond: (id = sdo.stream_file_id)
Filter: (component_id = ANY

-> Index Only Scan using ui_stream_file_id_component on stream_file sf
(cost=0.43..0.51 rows=1 width=8)
(actual time=0.014..0.014 rows=1 loops=598916)
Index Cond: ((id = sdo.stream_file_id) AND (component_id = ANY

Each time the index scan is 2.5 times slower on PG17. But:

PG 15:
Buffers: shared hit=2338397 read=57267
I/O Timings: shared read=3384.286

PG 17:
Buffers: shared hit=1909772 read=9933
I/O Timings: shared read=686.506

If I'm not mistaken, it seems like an insight.

--
regards, Andrei Lepikhov

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sajith Prabhakar Shetty (#1)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

Sajith Prabhakar Shetty <ssajith@blackduck.com> writes:

Now surprisingly one of the query i took from application which used to execute in 2s in PG15, is now taking 1min+ in PG17.

The data you've actually showed us does not back up that complaint.
But I do see a 2x slowdown.

Explain plan of the two queries almost same, all the joins and paths used are exactly same.

Yeah, that is odd. The only simple explanation I can think of is that you
are using a debug build of v17.

regards, tom lane

#4Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Tom Lane (#3)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

Hi Tom,

Thanks for the response.

In my first test, I used pg_upgrade to upgrade the instance from 15 to 17, and in my second test, I created PG15 and PG17 separately from brew repo, I don’t think I am using debug build here.

Sajith P Shetty
Principal Engineer
Black Duck
M +91 9448389989<tel:+919448389989>| ssajith@blackduck.com<mailto:ssajith@blackduck.com>
[signature_778616162]

From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thursday, 15 May 2025 at 9:16 PM
To: Sajith Prabhakar Shetty <ssajith@blackduck.com>
Cc: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Postgres: Queries are too slow after upgrading to PG17 from PG15
Sajith Prabhakar Shetty <ssajith@blackduck.com> writes:

Now surprisingly one of the query i took from application which used to execute in 2s in PG15, is now taking 1min+ in PG17.

The data you've actually showed us does not back up that complaint.
But I do see a 2x slowdown.

Explain plan of the two queries almost same, all the joins and paths used are exactly same.

Yeah, that is odd. The only simple explanation I can think of is that you
are using a debug build of v17.

regards, tom lane

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sajith Prabhakar Shetty (#4)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

Sajith Prabhakar Shetty <ssajith@blackduck.com> writes:

In my first test, I used pg_upgrade to upgrade the instance from 15 to 17, and in my second test, I created PG15 and PG17 separately from brew repo, I don’t think I am using debug build here.

Might be worth checking the brew recipes... it's common for packagers to
set the debug option for beta versions, and I could believe that whoever
packages PG for HomeBrew forgot to take that out again for 17.0.

But, assuming it's not that, could you put together a self-contained
test case that demos what you're seeing?

regards, tom lane

#6Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Andrei Lepikhov (#2)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

Hi Andrei,

Thank you and sorry for the delay in response.

We have found the slowness in multiple other products in our portfolio and we are blocked to proceed further.

In regards to your point below, it is true that the index only scan is 2.5 times slower on PG17 and primary difference is in that index-only scan of ui_stream_file_id_component at line 14. It takes 6 microseconds per row in PG 15, 2 microseconds per row in 16, and 14 microseconds in 17

I found out something which might be linked to, in release notes of PG17 (http://postgresql.org/docs/release/17.0/):

“ Allow btree<https://www.postgresql.org/docs/17/btree.html&gt; indexes to more efficiently find a set of values, such as those supplied by IN clauses using constants.”

@Tom Lane<mailto:tgl@sss.pgh.pa.us>, Sorry I could not get you reproducer yet but please do comment on this insight meanwhile I am working on it.

Thanks.

Sajith P Shetty
Principal Engineer
Black Duck
M +91 9448389989<tel:+919448389989>| ssajith@blackduck.com<mailto:ssajith@blackduck.com>
[signature_778616162]

From: Andrei Lepikhov <lepihov@gmail.com>
Date: Thursday, 15 May 2025 at 7:56 PM
To: Sajith Prabhakar Shetty <ssajith@blackduck.com>, pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Cc: Peter Geoghegan <pg@bowt.ie>
Subject: Re: Postgres: Queries are too slow after upgrading to PG17 from PG15
On 15/5/2025 07:33, Sajith Prabhakar Shetty wrote:

Hi,

Most of the queries got slower after upgrading our postgres from version
15 to 17 using pg_upgrade. I reconfirmed that "vacuum full, analyze"
were all taken care.

To debug, instead of upgrade, I installed two instances one with
postgres 15 and another postgres 17 with the same application dump restored.

Now surprisingly one of the query i took from application which used to
execute in 2s in PG15, is now taking 1min+ in PG17. I also observed that
some of the operations involving DML operations slowed down too in PG17.

Explain plan of the two queries almost same, all the joins and paths
used are exactly same.

Could anybody please provide some insights here?

Curious, the difference in Index Only Scan node:

-> Index Scan using stream_file_pkey on stream_file sf
(cost=1.63..1.86 rows=1 width=8)
(actual time=0.006..0.006 rows=1 loops=598916)
Index Cond: (id = sdo.stream_file_id)
Filter: (component_id = ANY

-> Index Only Scan using ui_stream_file_id_component on stream_file sf
(cost=0.43..0.51 rows=1 width=8)
(actual time=0.014..0.014 rows=1 loops=598916)
Index Cond: ((id = sdo.stream_file_id) AND (component_id = ANY

Each time the index scan is 2.5 times slower on PG17. But:

PG 15:
Buffers: shared hit=2338397 read=57267
I/O Timings: shared read=3384.286

PG 17:
Buffers: shared hit=1909772 read=9933
I/O Timings: shared read=686.506

If I'm not mistaken, it seems like an insight.

--
regards, Andrei Lepikhov

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
In reply to: Sajith Prabhakar Shetty (#6)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Thu, Jul 17, 2025 at 5:58 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:

In regards to your point below, it is true that the index only scan is 2.5 times slower on PG17 and primary difference is in that index-only scan of ui_stream_file_id_component at line 14. It takes 6 microseconds per row in PG 15, 2 microseconds per row in 16, and 14 microseconds in 17

The important difference is the choice of index for the outermost
nestloop join's inner index scan. A different index is used on
Postgres 17:

On Postgres 15, you're using the likely-single-column stream_file_pkey
index, which uses filter quals for the ScalarArrayOp/= ANY condition.
Whereas on Postgres 17, you're using the ui_stream_file_id_component
index instead (a multicolumn index), which uses a true index qual for
the "id = sdo.stream_file_id" as well as for the ScalarArrayOp/= ANY
condition.

--
Peter Geoghegan

#8Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Peter Geoghegan (#7)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

Hi Peter,

Thanks for the response, but I don’t understand when you meant “you are using different index”, by any chance did you mean the optimizer?
Because I have used exactly the same data dump for all PG15,16 and 17 for my tests with no difference in data nor schema structure.

Sajith P Shetty
Principal Engineer
Black Duck
M +91 9448389989<tel:+919448389989>| ssajith@blackduck.com<mailto:ssajith@blackduck.com>
[signature_778616162]

From: Peter Geoghegan <pg@bowt.ie>
Date: Thursday, 17 July 2025 at 8:35 PM
To: Sajith Prabhakar Shetty <ssajith@blackduck.com>
Cc: Andrei Lepikhov <lepihov@gmail.com>, pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>, Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: Postgres: Queries are too slow after upgrading to PG17 from PG15
On Thu, Jul 17, 2025 at 5:58 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:

In regards to your point below, it is true that the index only scan is 2.5 times slower on PG17 and primary difference is in that index-only scan of ui_stream_file_id_component at line 14. It takes 6 microseconds per row in PG 15, 2 microseconds per row in 16, and 14 microseconds in 17

The important difference is the choice of index for the outermost
nestloop join's inner index scan. A different index is used on
Postgres 17:

On Postgres 15, you're using the likely-single-column stream_file_pkey
index, which uses filter quals for the ScalarArrayOp/= ANY condition.
Whereas on Postgres 17, you're using the ui_stream_file_id_component
index instead (a multicolumn index), which uses a true index qual for
the "id = sdo.stream_file_id" as well as for the ScalarArrayOp/= ANY
condition.

--
Peter Geoghegan

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
In reply to: Sajith Prabhakar Shetty (#8)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Thu, Jul 17, 2025 at 11:49 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:

Thanks for the response, but I don’t understand when you meant “you are using different index”, by any chance did you mean the optimizer?
Because I have used exactly the same data dump for all PG15,16 and 17 for my tests with no difference in data nor schema structure.

I simply mean that the plan is substantially different, in that there
is an index scan node on 17 that uses a completely different index to
the corresponding index scan node on 15. While the plan looks almost
the same, this one detail is huge.

In other words, I disagree with your summary of the plan, when you
said "Explain plan of the two queries almost same, all the joins and
paths used are exactly same". The paths are not the same.

--
Peter Geoghegan

#10Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Peter Geoghegan (#9)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

Hello,

We are able to get you a self-contained reproducer, please find attached dump, sql script and read me files.
Sorry for the delay, since our initial reproducer from the product was too large and sensitive to share.

This can clearly demonstrate performance degradation from postgres 17 versus postrges 15.

NOTE: This degradation has blocked our PG upgrade on multiple products in our portfolio.

Thanks.

Sajith P Shetty
Principal Engineer
Black Duck
M +91 9448389989<tel:+919448389989>| ssajith@blackduck.com<mailto:ssajith@blackduck.com>
[signature_778616162]

From: Peter Geoghegan <pg@bowt.ie>
Date: Thursday, 17 July 2025 at 9:25 PM
To: Sajith Prabhakar Shetty <ssajith@blackduck.com>
Cc: Andrei Lepikhov <lepihov@gmail.com>, pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>, Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: Postgres: Queries are too slow after upgrading to PG17 from PG15
On Thu, Jul 17, 2025 at 11:49 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:

Thanks for the response, but I don’t understand when you meant “you are using different index”, by any chance did you mean the optimizer?
Because I have used exactly the same data dump for all PG15,16 and 17 for my tests with no difference in data nor schema structure.

I simply mean that the plan is substantially different, in that there
is an index scan node on 17 that uses a completely different index to
the corresponding index scan node on 15. While the plan looks almost
the same, this one detail is huge.

In other words, I disagree with your summary of the plan, when you
said "Explain plan of the two queries almost same, all the joins and
paths used are exactly same". The paths are not the same.

--
Peter Geoghegan

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
simple_reproducer_setup_sql.pgsqlapplication/octet-stream; name=simple_reproducer_setup_sql.pgsqlDownload
simple_reproducer.dumpapplication/octet-stream; name=simple_reproducer.dumpDownload+1-3
simple_reproducer_message_readme.txttext/plain; name=simple_reproducer_message_readme.txtDownload
In reply to: Sajith Prabhakar Shetty (#10)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Mon, Jul 28, 2025 at 2:20 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:

We are able to get you a self-contained reproducer, please find attached dump, sql script and read me files.

I find that your test case spends a great deal of time on nbtree
preprocessing, which happens once per execution of the inner index
scan on "zsf". According to "perf top", most cycles on spent on these:

32.02% postgres [.] FunctionCall2Coll
22.01% postgres [.] qsort_arg
18.64% postgres [.] _bt_compare_array_elements
8.20% postgres [.] btint8cmp
3.97% postgres [.] _bt_preprocess_keys
...

The query takes ~1550ms on my local workstation. If I just comment out
the relevant qsort, it'll take only ~190 ms. That qsort might not be
the only problem here, but it is the immediate problem. Note that
commenting out the qsort should produce the same answer, at least for
this one query, since the constants that appear in the query are
already sorted (the EXPLAIN row counts match what they show with the
qsort in place).

In principle, we could limit the use of the qsort to the first inner
index scan, and safely skip each subsequent qsort -- at least in cases
where the array was a constant (which includes this case). Obviously,
we *do* need a qsort (what if the constants aren't exactly in sorted
order?), but we generally don't need to do it once per inner index
scan.

There is a separate question as to whether or not the planner should
pick this plan in the first place. I find that I can get a faster plan
(without commenting out anything) by tricking the planner into using
the single column "zsf_pkey", rather than the multi-column
"zsf_id_fpi_cid_key". Even then, I can only get a merge join with the
"zsf_pkey" index on the inner side -- not a nested loop join with the
"zsf_pkey" index on the inner side, as expected. The merge join plan
brings the execution time down to ~90ms, which is better, but
certainly still less than ideal.

--
Peter Geoghegan

#12Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Peter Geoghegan (#11)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

Thanks Peter for detailed response. We really appreciate your time and effort in this regard.
Please help me on the next step of actions required from our end. I understand you have found out the root cause, can you confirm if any fix would be approved for this case in upcoming patches and any ETA is really helpful.

Also note that this has affected many of our SQL queries, and we cannot afford to modify our code to accommodate the PG17 planner changes.

Looping in the email, my colleague Todd, who helped me get the reproducer steps.

Sajith P Shetty
Principal Engineer
Black Duck
M +91 9448389989<tel:+919448389989>| ssajith@blackduck.com<mailto:ssajith@blackduck.com>
[signature_778616162]

From: Peter Geoghegan <pg@bowt.ie>
Date: Tuesday, 29 July 2025 at 2:12 AM
To: Sajith Prabhakar Shetty <ssajith@blackduck.com>
Cc: Andrei Lepikhov <lepihov@gmail.com>, pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>, Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: Postgres: Queries are too slow after upgrading to PG17 from PG15
On Mon, Jul 28, 2025 at 2:20 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:

We are able to get you a self-contained reproducer, please find attached dump, sql script and read me files.

I find that your test case spends a great deal of time on nbtree
preprocessing, which happens once per execution of the inner index
scan on "zsf". According to "perf top", most cycles on spent on these:

32.02% postgres [.] FunctionCall2Coll
22.01% postgres [.] qsort_arg
18.64% postgres [.] _bt_compare_array_elements
8.20% postgres [.] btint8cmp
3.97% postgres [.] _bt_preprocess_keys
...

The query takes ~1550ms on my local workstation. If I just comment out
the relevant qsort, it'll take only ~190 ms. That qsort might not be
the only problem here, but it is the immediate problem. Note that
commenting out the qsort should produce the same answer, at least for
this one query, since the constants that appear in the query are
already sorted (the EXPLAIN row counts match what they show with the
qsort in place).

In principle, we could limit the use of the qsort to the first inner
index scan, and safely skip each subsequent qsort -- at least in cases
where the array was a constant (which includes this case). Obviously,
we *do* need a qsort (what if the constants aren't exactly in sorted
order?), but we generally don't need to do it once per inner index
scan.

There is a separate question as to whether or not the planner should
pick this plan in the first place. I find that I can get a faster plan
(without commenting out anything) by tricking the planner into using
the single column "zsf_pkey", rather than the multi-column
"zsf_id_fpi_cid_key". Even then, I can only get a merge join with the
"zsf_pkey" index on the inner side -- not a nested loop join with the
"zsf_pkey" index on the inner side, as expected. The merge join plan
brings the execution time down to ~90ms, which is better, but
certainly still less than ideal.

--
Peter Geoghegan

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
In reply to: Peter Geoghegan (#11)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Mon, Jul 28, 2025 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:

The query takes ~1550ms on my local workstation. If I just comment out
the relevant qsort, it'll take only ~190 ms. That qsort might not be
the only problem here, but it is the immediate problem. Note that
commenting out the qsort should produce the same answer, at least for
this one query, since the constants that appear in the query are
already sorted (the EXPLAIN row counts match what they show with the
qsort in place).

Actually, that isn't quite true -- the constants weren't in sorted order.

I find that if I presort the elements within the query text itself,
the runtime goes down to only ~410ms. That's still not great, but it
is a vast improvement.

--
Peter Geoghegan

#14Merlin Moncure
mmoncure@gmail.com
In reply to: Peter Geoghegan (#11)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Mon, Jul 28, 2025 at 2:42 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jul 28, 2025 at 2:20 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:

We are able to get you a self-contained reproducer, please find attached dump, sql script and read me files.

I find that your test case spends a great deal of time on nbtree
preprocessing, which happens once per execution of the inner index
scan on "zsf". According to "perf top", most cycles on spent on these:

32.02% postgres [.] FunctionCall2Coll
22.01% postgres [.] qsort_arg
18.64% postgres [.] _bt_compare_array_elements
8.20% postgres [.] btint8cmp
3.97% postgres [.] _bt_preprocess_keys
...

The query takes ~1550ms on my local workstation. If I just comment out
the relevant qsort, it'll take only ~190 ms. That qsort might not be

side question: if 50% of time (per perf top) is spent in qsort and
subroutines, how come query time goes down ~85%? is this a limitation
of perf or some other issue?

side question #2: 32% time spent on FunctionCall2Coll seems like a lot
-- is this inclusive of the routine under the function pointer?

just curious on both of these

merlin

In reply to: Sajith Prabhakar Shetty (#12)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Tue, Jul 29, 2025 at 1:49 AM Sajith Prabhakar Shetty
<ssajith@blackduck.com> wrote:

I understand you have found out the root cause

I wouldn't say that -- it isn't clear that this issue with qsorting
during preprocessing is the root cause.

As I said, it is (at a minimum) the immediate problem with your query.
But the underlying code path didn't change all that much in Postgres
17 -- it just started to be hit a lot more with this particular query.
This is more or less a consequence of the fact that the new-to-17
query plan has an inner index scan with an inconvenient combination of
2 things: it is very selective and fast (fast per individual
execution), even though it has a couple of SAOPs (a SAOP is an = ANY()
condition) that each have several hundred array elements. That's what
allows these startup costs (i.e. array preprocessing that does these 2
qsorts on each execution) to dominate so much.

My Postgres 17 commit 5bf748b8 made the planner stop generating
distinct index paths that make lower-order SAOPs into "Filter:"
conditions, which are executed outside of the core B-Tree code, using
a completely different code path. Those index paths are used by the
Postgres 15 plan. I am very hesitant to add anything like that back,
though, because they're very unlikely to be faster than an index path
that makes the executor push down the SAOP condition into the B-Tree
code (your counterexample notwithstanding).

Perhaps Tom can weigh-in here. I removed code that generated these
alternative index paths from the planner because its original
justification (see bugfix commit a4523c5a, a follow-up to bugfix
commit 807a40c5) no longer applied. Perhaps this should be revisited
now, or perhaps the issue should be ameliorated on the nbtree side. Or
maybe we should just do nothing -- the issue can be worked around in
the application itself.

--
Peter Geoghegan

#16Todd Cook
cookt@blackduck.com
In reply to: Peter Geoghegan (#15)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On 7/30/25, 3:17 PM, "Peter Geoghegan" <pg@bowt.ie <mailto:pg@bowt.ie>> wrote:
Perhaps Tom can weigh-in here. I removed code that generated these
alternative index paths from the planner because its original
justification (see bugfix commit a4523c5a, a follow-up to bugfix
commit 807a40c5) no longer applied. Perhaps this should be revisited
now, or perhaps the issue should be ameliorated on the nbtree side. Or
maybe we should just do nothing -- the issue can be worked around in
the application itself.

I work at the same company as Sajith, but on a different product. The reproducer he
provided is just a sample; it's not the only problem. Load testing in my team shows
that PG 17 is about 4x slower than PG 15 across the board. It's bordering on unusable
for production deployments.

Unfortunately, the load testing setup doesn't really help isolate individual, regressing
queries. However, I'm more than willing to help support any further investigation if
needed or helpful.

-- todd

In reply to: Merlin Moncure (#14)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Wed, Jul 30, 2025 at 2:59 PM Merlin Moncure <mmoncure@gmail.com> wrote:

side question: if 50% of time (per perf top) is spent in qsort and
subroutines, how come query time goes down ~85%? is this a limitation
of perf or some other issue?

I used perf's default event type for this, which is "cycles". It's
well known that the relationship between cycles and wallclock time is
very complicated. In my experience "cycles" tends to be useful for a
first order pass, to get a very general sense of what's going on,
before considering possible solutions. That's all I needed here -- I
wasn't trying to be precise.

Separately, there's bound to be complicated nonlinear effects in play.
That's one of the reasons why it is so hard to apply profiling
information effectively, at least when optimizing a given piece of
code that is already reasonably well understood.

side question #2: 32% time spent on FunctionCall2Coll seems like a lot
-- is this inclusive of the routine under the function pointer?

just curious on both of these

I ran perf in a way that counted FunctionCall2Coll separately from
btint8cmp. It's not that surprising that FunctionCall2Coll dominated,
since btint8cmp is so simple.

--
Peter Geoghegan

In reply to: Todd Cook (#16)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Wed, Jul 30, 2025 at 3:48 PM Todd Cook <cookt@blackduck.com> wrote:

I work at the same company as Sajith, but on a different product. The reproducer he
provided is just a sample; it's not the only problem. Load testing in my team shows
that PG 17 is about 4x slower than PG 15 across the board. It's bordering on unusable
for production deployments.

I suggest that you rewrite affected queries to make them join against
a VALUES() with the same constants as those currently used in the
larger IN() list. If you're not sure whether the set of constants from
the application will be reliably unique, you can use DISTINCT to make
sure.

--
Peter Geoghegan

#19David Rowley
dgrowleyml@gmail.com
In reply to: Peter Geoghegan (#18)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Thu, 31 Jul 2025 at 08:14, Peter Geoghegan <pg@bowt.ie> wrote:

I suggest that you rewrite affected queries to make them join against
a VALUES() with the same constants as those currently used in the
larger IN() list. If you're not sure whether the set of constants from
the application will be reliably unique, you can use DISTINCT to make
sure.

Even just presorting the IN list constants would make it better. If I
manually adjust the recreator query to sort the items in both IN
lists, I get:

Execution Time: 263.365 ms

vs:

Execution Time: 804.377 ms

Of course, the qsort_arg() call still happens, it'll hit qsort_arg's
presorted short-circuit case and will do very little.

I've not looked in great detail, but I did wonder if it's worth
adjusting ExecIndexBuildScanKeys() to sort the array in a
ScalarArrayOpExpr when it's Const beforehand. That might be a bit of
wasted effort if there's just one scan, however.

David

Attachments:

sorted_in_lists.sqlapplication/octet-stream; name=sorted_in_lists.sqlDownload
#20David Rowley
dgrowleyml@gmail.com
In reply to: Todd Cook (#16)
Re: Postgres: Queries are too slow after upgrading to PG17 from PG15

On Thu, 31 Jul 2025 at 07:49, Todd Cook <cookt@blackduck.com> wrote:

I work at the same company as Sajith, but on a different product. The reproducer he
provided is just a sample; it's not the only problem. Load testing in my team shows
that PG 17 is about 4x slower than PG 15 across the board. It's bordering on unusable
for production deployments.

Unfortunately, the load testing setup doesn't really help isolate individual, regressing
queries. However, I'm more than willing to help support any further investigation if
needed or helpful.

Unfortunately, we can't really work with that much information. It's
not like we have a list of things we know are slower in newer versions
vs older versions. Changes generally go through a large amount of
testing to help ensure these regressions don't happen, so if you
report one, unless someone else beat you to it, there's a decent
chance we didn't know about it. There's nothing we can really do to
help you based on this much information. There's just no chance we'd
have shipped PG17 if it was known to be x4 slower than some previous
version.

You may be able to narrow down what's slower using pg_stat_statements.
If you can, then use EXPLAIN and compare the plans. Did PG17 choose a
different plan? Does EXPLAIN ANALYZE reveal any inaccurate statistics?
Are both instances configured the same way?

Once you find a specific query that's causing the issue, then a report
similar to what Sajith has done is a good way to get help.

David

In reply to: David Rowley (#19)
#22Todd Cook
cookt@blackduck.com
In reply to: David Rowley (#20)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#15)
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#24)
In reply to: Tom Lane (#25)
#27Todd Cook
cookt@blackduck.com
In reply to: Tom Lane (#23)
In reply to: Todd Cook (#27)
In reply to: Peter Geoghegan (#26)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#30)
In reply to: Tom Lane (#30)
In reply to: Tom Lane (#31)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#33)
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#35)
#37Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Tom Lane (#36)
In reply to: Sajith Prabhakar Shetty (#37)
#39Todd Cook
cookt@blackduck.com
In reply to: Peter Geoghegan (#33)
#40Merlin Moncure
mmoncure@gmail.com
In reply to: Todd Cook (#39)
#41Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Merlin Moncure (#40)
In reply to: Sajith Prabhakar Shetty (#41)
#43Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Peter Geoghegan (#42)
#44Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Sajith Prabhakar Shetty (#43)
In reply to: Sajith Prabhakar Shetty (#44)
#46Merlin Moncure
mmoncure@gmail.com
In reply to: Peter Geoghegan (#45)
#47Todd Cook
cookt@blackduck.com
In reply to: Merlin Moncure (#40)
#48Todd Cook
cookt@blackduck.com
In reply to: Todd Cook (#47)
In reply to: Todd Cook (#48)
#50Todd Cook
cookt@blackduck.com
In reply to: Peter Geoghegan (#49)
In reply to: Todd Cook (#50)
#52Todd Cook
cookt@blackduck.com
In reply to: Peter Geoghegan (#51)
In reply to: Todd Cook (#52)
#54Sajith Prabhakar Shetty
ssajith@blackduck.com
In reply to: Peter Geoghegan (#53)