BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Started by PG Bug reporting formover 3 years ago13 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17652
Logged by: Michael Guissine
Email address: mguissine@gmail.com
PostgreSQL version: 14.4
Operating system: AWS RDS
Description:

Hello,

We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
performance degradation. The issue seems to be isolated to one specific use
case which is querying partitioned table with a large number of elements in
the IN clause. In our setup, we observed that Postgres is switching from
Index Scan to Bitmap Index Scan/Bitmap Heap Scan if the number of elements
in the IN clause is above 60, below 60 the Index Scan is performed on all
partitions.

Query and query plan are attached below. We also tried to turn off
enable_bitmapscan and observed that query is performing well which was our
experience in the previous version (pg 13.4)

https://explain.depesz.com/s/F587

Thank you!

#2Jeff Janes
jeff.janes@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

On Wed, Oct 19, 2022 at 12:21 PM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 17652
Logged by: Michael Guissine
Email address: mguissine@gmail.com
PostgreSQL version: 14.4
Operating system: AWS RDS
Description:

Hello,

We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
performance degradation.

First things first, did you do an ANALYZE on the database after the
upgrade? (Maybe RDS does it for you, I don't know. But it isn't worth
looking into anything else until you make sure that that has been done)

Cheers,

Jeff

#3Michael Guissine
mguissine@gmail.com
In reply to: Jeff Janes (#2)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Hi Jeff,

Sorry I should've mentioned that. Yes we did ANALYZE all the tables after
upgrade, we even went ahead and REINDEX all indexes and then also run
VACUUM ANALYZE but still facing same performance issue

On Wed, Oct 19, 2022 at 1:09 PM Jeff Janes <jeff.janes@gmail.com> wrote:

Show quoted text

On Wed, Oct 19, 2022 at 12:21 PM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 17652
Logged by: Michael Guissine
Email address: mguissine@gmail.com
PostgreSQL version: 14.4
Operating system: AWS RDS
Description:

Hello,

We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
performance degradation.

First things first, did you do an ANALYZE on the database after the
upgrade? (Maybe RDS does it for you, I don't know. But it isn't worth
looking into anything else until you make sure that that has been done)

Cheers,

Jeff

#4Andres Freund
andres@anarazel.de
In reply to: PG Bug reporting form (#1)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Yi,

On 2022-10-19 16:11:15 +0000, PG Bug reporting form wrote:

We recently upgraded our AWS RDS instances to PG 14.4 and are seeing weird
performance degradation.

Are they actually configured the same? Of particular interest would be
shared_buffers, effective_cache_size, random_page_cost, seq_page_cost,
cpu_index_tuple_cost, cpu_operator_cost, cpu_tuple_cost, work_mem.

Your explain shows that most of the time is spent doing IO:
Append (cost=0.43..93144.17 rows=121 width=1121) (actual time=225005.053..1160202.658 rows=7 loops=1)
Buffers: shared hit=379846 read=2382552 dirtied=57320 written=231982
I/O Timings: read=1113819.977 write=2850.752

Is there a chance at least part of this is due to a cold cache after
upgrading?

What is the definition of the indexes the partitions?

Based on the names of the indexes I'd guess that the reason the bitmap scans
are bad is that the index includes the partition_key column as a leading
column, but skips over that column and just filters by "text".

I'm a bit confused by your plan showing:
Rows Removed by Index Recheck: 62841
and
Heap Blocks: exact=46111
in the same node.

I haven't rechecked the code, but I don't think that should happen for btree
indexes. It could however for gin, bloom, hash etc.

Query and query plan are attached below. We also tried to turn off
enable_bitmapscan and observed that query is performing well which was our
experience in the previous version (pg 13.4)

https://explain.depesz.com/s/F587

Do you know what the plan was in 13?

Greetings,

Andres Freund

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Guissine (#3)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

On Wed, Oct 19, 2022 at 1:30 PM Michael Guissine <mguissine@gmail.com>
wrote:

Hi Jeff,

Sorry I should've mentioned that. Yes we did ANALYZE all the tables after
upgrade, we even went ahead and REINDEX all indexes and then also run
VACUUM ANALYZE but still facing same performance issue

I'm guessing that paritioned_table_hash_p3_partition_key_text_idx is a
pg_trgm index, and that you are victim of the change "Allow GiST/GIN
pg_trgm indexes to do equality lookups". It is not surprising that this
change would bite someone. If you can redo the upgrade but this time
refrain from upgrading pg_trgm from 1.5 to 1.6, that should fix the issue
(assuming RDS even gives you enough control to do that), or drop the
extension and recreate it specifically as version 1.5 (which unfortunately
would cascade to dropping all the indexes, and they would need to be
recreated after--even though the index structures themselves would require
no change).

I have a file which would let you downgrade the extension without needing
to drop all the pg_trgm indexes, but there is no way for you to deploy that
to RDS.

Cheers,

Jeff

Show quoted text

Attachments:

pg_trgm--1.6--1.5.sqlapplication/octet-stream; name=pg_trgm--1.6--1.5.sqlDownload
#6Michael Guissine
mguissine@gmail.com
In reply to: Andres Freund (#4)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Thank you Andres,

The column names in the plan and queries are obfuscated, the cold cache
should not be an issue as we upgraded 3 days ago and since
reindexed/re-analyzed everything plus we have an active application running
against this database so the data should be cached

here is a obfuscated table definition, there are multiple indexes on those
columns to support different query patterns, note that the
partitioned_table_partition_key_start_time_text index was added after the
upgrade as we thought it might help the optimizer to come up with a better
plan but it didn't help

Partitioned table
"public.partitioned_table"
Column | Type | Collation |
Nullable | Default
---------------------------+-----------------------------+-----------+----------+------------------------------------
id | bigint | |
not null | nextval('partitioned_table_id_seq'::regclass)
partition_key | integer | |
not null |
text | text | |
|
description | text | |
|
...
more fields
...
Partition key: HASH (partition_key)
Indexes:
"partitioned_table_pkey" PRIMARY KEY, btree (partition_key, id)
"partitioned_table_partition_key_start_time_index" btree
(partition_key, start_time)
"partitioned_table_partition_key_start_time_text" btree
(partition_key, start_time, text)
"partitioned_table_partition_key_text_idx" gin (partition_key,
text gin_trgm_ops)
"partitioned_table_text_tsv_index" gin
(to_tsvector('simple'::regconfig, text)) WHERE text IS NOT NULL
Number of partitions: 128 (Use \d+ to list them.)

On Wed, Oct 19, 2022 at 2:46 PM Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Yi,

On 2022-10-19 16:11:15 +0000, PG Bug reporting form wrote:

We recently upgraded our AWS RDS instances to PG 14.4 and are seeing

weird

performance degradation.

Are they actually configured the same? Of particular interest would be
shared_buffers, effective_cache_size, random_page_cost, seq_page_cost,
cpu_index_tuple_cost, cpu_operator_cost, cpu_tuple_cost, work_mem.

Your explain shows that most of the time is spent doing IO:
Append (cost=0.43..93144.17 rows=121 width=1121) (actual
time=225005.053..1160202.658 rows=7 loops=1)
Buffers: shared hit=379846 read=2382552 dirtied=57320 written=231982
I/O Timings: read=1113819.977 write=2850.752

Is there a chance at least part of this is due to a cold cache after
upgrading?

What is the definition of the indexes the partitions?

Based on the names of the indexes I'd guess that the reason the bitmap
scans
are bad is that the index includes the partition_key column as a leading
column, but skips over that column and just filters by "text".

I'm a bit confused by your plan showing:
Rows Removed by Index Recheck: 62841
and
Heap Blocks: exact=46111
in the same node.

I haven't rechecked the code, but I don't think that should happen for
btree
indexes. It could however for gin, bloom, hash etc.

Query and query plan are attached below. We also tried to turn off
enable_bitmapscan and observed that query is performing well which was

our

experience in the previous version (pg 13.4)

https://explain.depesz.com/s/F587

Do you know what the plan was in 13?

Greetings,

Andres Freund

#7Michael Guissine
mguissine@gmail.com
In reply to: Jeff Janes (#5)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index,
you are right!
interesting, why would it use that index at all on some partitions but not
on others? the query is a simple match on 3 columns and we have other
indexes to cover it

On Wed, Oct 19, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:

Show quoted text

On Wed, Oct 19, 2022 at 1:30 PM Michael Guissine <mguissine@gmail.com>
wrote:

Hi Jeff,

Sorry I should've mentioned that. Yes we did ANALYZE all the tables after
upgrade, we even went ahead and REINDEX all indexes and then also run
VACUUM ANALYZE but still facing same performance issue

I'm guessing that paritioned_table_hash_p3_partition_key_text_idx is a
pg_trgm index, and that you are victim of the change "Allow GiST/GIN
pg_trgm indexes to do equality lookups". It is not surprising that this
change would bite someone. If you can redo the upgrade but this time
refrain from upgrading pg_trgm from 1.5 to 1.6, that should fix the issue
(assuming RDS even gives you enough control to do that), or drop the
extension and recreate it specifically as version 1.5 (which unfortunately
would cascade to dropping all the indexes, and they would need to be
recreated after--even though the index structures themselves would require
no change).

I have a file which would let you downgrade the extension without needing
to drop all the pg_trgm indexes, but there is no way for you to deploy that
to RDS.

Cheers,

Jeff

#8Andres Freund
andres@anarazel.de
In reply to: Michael Guissine (#6)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Hi,

On 2022-10-19 15:32:17 -0400, Michael Guissine wrote:

The column names in the plan and queries are obfuscated

I guessed so :)

Indexes:
"partitioned_table_pkey" PRIMARY KEY, btree (partition_key, id)
"partitioned_table_partition_key_start_time_index" btree
(partition_key, start_time)
"partitioned_table_partition_key_start_time_text" btree
(partition_key, start_time, text)
"partitioned_table_partition_key_text_idx" gin (partition_key,
text gin_trgm_ops)
"partitioned_table_text_tsv_index" gin
(to_tsvector('simple'::regconfig, text)) WHERE text IS NOT NULL
Number of partitions: 128 (Use \d+ to list them.)

Ok, that explains why we see the massive rechecks. For the bitmap scans we end
up using a gin index instead of the btree index.

The reason you're seeing this in 14 is likely the following entry in the
release notes:

Allow GiST/GIN pg_trgm indexes to do equality lookups (Julien Rouhaud)
This is similar to LIKE except no wildcards are honored.

CCed author/committer of that change. I suspect this would need proper costing
to not be inadvertently be used.

I assume you need the pg_trgm index?

In theory you could likely work around the issue by reverting the relevant
catalog change:

+ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+        OPERATOR        11       pg_catalog.= (text, text);

by issuing

ALTER OPERATOR FAMILY gin_trgm_ops USING gin DROP OPERATOR 11 (text, text);

but I suspect you won't be allowed to do that in RDS.

Greetings,

Andres Freund

In reply to: Andres Freund (#8)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

On Wed, Oct 19, 2022 at 1:12 PM Andres Freund <andres@anarazel.de> wrote:

The reason you're seeing this in 14 is likely the following entry in the
release notes:

Allow GiST/GIN pg_trgm indexes to do equality lookups (Julien Rouhaud)
This is similar to LIKE except no wildcards are honored.

CCed author/committer of that change. I suspect this would need proper costing
to not be inadvertently be used.

I am reminded of this recent thread:

/messages/by-id/3188617.44csPzL39Z@aivenronan

(Perhaps you were thinking of the same thing already, but thought I'd
mention it, just in case.)
--
Peter Geoghegan

#10Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#9)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Hi,

On 2022-10-19 13:20:37 -0700, Peter Geoghegan wrote:

On Wed, Oct 19, 2022 at 1:12 PM Andres Freund <andres@anarazel.de> wrote:

The reason you're seeing this in 14 is likely the following entry in the
release notes:

Allow GiST/GIN pg_trgm indexes to do equality lookups (Julien Rouhaud)
This is similar to LIKE except no wildcards are honored.

CCed author/committer of that change. I suspect this would need proper costing
to not be inadvertently be used.

I am reminded of this recent thread:

/messages/by-id/3188617.44csPzL39Z@aivenronan

(Perhaps you were thinking of the same thing already, but thought I'd
mention it, just in case.)

No, I hadn't seen that one. That indeed looks quite relevant...

Greetings,

Andres Freund

#11Michael Guissine
mguissine@gmail.com
In reply to: Andres Freund (#10)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Thank you all for such super quick responses. We are considering removing
the pg_trgm index and hope that will help!

On Wed, Oct 19, 2022 at 4:36 PM Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 2022-10-19 13:20:37 -0700, Peter Geoghegan wrote:

On Wed, Oct 19, 2022 at 1:12 PM Andres Freund <andres@anarazel.de>

wrote:

The reason you're seeing this in 14 is likely the following entry in

the

release notes:

Allow GiST/GIN pg_trgm indexes to do equality lookups (Julien

Rouhaud)

This is similar to LIKE except no wildcards are honored.

CCed author/committer of that change. I suspect this would need proper

costing

to not be inadvertently be used.

I am reminded of this recent thread:

/messages/by-id/3188617.44csPzL39Z@aivenronan

(Perhaps you were thinking of the same thing already, but thought I'd
mention it, just in case.)

No, I hadn't seen that one. That indeed looks quite relevant...

Greetings,

Andres Freund

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Guissine (#7)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

On Wed, Oct 19, 2022 at 3:44 PM Michael Guissine <mguissine@gmail.com>
wrote:

paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index,
you are right!
interesting, why would it use that index at all on some partitions but not
on others? the query is a simple match on 3 columns and we have other
indexes to cover it

The two index choices are (falsely) similar in cost, so just small
differences in the selectivity estimates is enough to drive one over the
other from partition to partition.

There are (at least) two planner deficiencies at work here.

For the actually faster index, it thinks it will need to re-descend the
index one time for every value in the IN-list on the leading column, and
each of those re-descents is going to end up on a different leaf page,
charging about one random_io_cost for each one. But it is counting a
re-descent for every item in the IN-list, for every partition. But it only
needs to re-descend for the items of the IN-list which also happen to match
the remainder for that hash partition. This is incorrectly inflating the
cost of the faster index. (You could argue that if the executor really
does do the re-descents, then it is correct for the planner to account for
them. But I suspect that many of the hopeless re-descents may end up on
leaf pages already in memory. Anyway, fixing the planner and fixing the
executor would probably go hand in hand)

For the slower index, the gin code doesn't estimate how many rechecks there
will be, only how many end results there will be. But the rechecks are
where most of the true cost is. So that makes this one end up with an
incorrectly low cost. The overestimation of one and the underestimation of
the other bring the two estimates close enough together that random
variations in the stats can drive the choice between them. Now you could
say "all the row estimates are 1, so how could there be any variation"?
But notice that for the bitmap heap scans, there is a filter step which
removes around 1000 rows from each partition. We know how many rows were
actually removed from each partition, but we are not told how many rows the
planner thought it would remove. Variations in this unseen estimated
filtered number would lead to variations in the cost estimate.

While fixing the planner would be nice, I think the best current solution
for you is to change the order of columns in the 3-column index to be
(text, start_time, key). Since the condition on "text" is extremely
selective, the planner should know that by the time it gets to the key
column it can check it with a single hash-table probe rather than a bunch
of costly re-descents. This should be estimated to be faster than either
of the existing two indexes, and also should actually be faster.

Cheers,

Jeff

Show quoted text
#13Michael Guissine
mguissine@gmail.com
In reply to: Jeff Janes (#12)
Re: BUG #17652: Performance degradation after migrating from v13.4 to v14.4

Hi Jeff, this is super useful, thanks for spending your time explaining the
internals. In our case, we ended up dropping the trgm index and everything
is back to normal but I like your idea of reversing the column order, will
give it a try!

~ Michael

On Thu, Oct 20, 2022 at 1:03 AM Jeff Janes <jeff.janes@gmail.com> wrote:

Show quoted text

On Wed, Oct 19, 2022 at 3:44 PM Michael Guissine <mguissine@gmail.com>
wrote:

paritioned_table_hash_p3_partition_key_text_idx is in fact pg_trgm index,
you are right!
interesting, why would it use that index at all on some partitions but
not on others? the query is a simple match on 3 columns and we have other
indexes to cover it

The two index choices are (falsely) similar in cost, so just small
differences in the selectivity estimates is enough to drive one over the
other from partition to partition.

There are (at least) two planner deficiencies at work here.

For the actually faster index, it thinks it will need to re-descend the
index one time for every value in the IN-list on the leading column, and
each of those re-descents is going to end up on a different leaf page,
charging about one random_io_cost for each one. But it is counting a
re-descent for every item in the IN-list, for every partition. But it only
needs to re-descend for the items of the IN-list which also happen to match
the remainder for that hash partition. This is incorrectly inflating the
cost of the faster index. (You could argue that if the executor really
does do the re-descents, then it is correct for the planner to account for
them. But I suspect that many of the hopeless re-descents may end up on
leaf pages already in memory. Anyway, fixing the planner and fixing the
executor would probably go hand in hand)

For the slower index, the gin code doesn't estimate how many rechecks
there will be, only how many end results there will be. But the rechecks
are where most of the true cost is. So that makes this one end up with an
incorrectly low cost. The overestimation of one and the underestimation of
the other bring the two estimates close enough together that random
variations in the stats can drive the choice between them. Now you could
say "all the row estimates are 1, so how could there be any variation"?
But notice that for the bitmap heap scans, there is a filter step which
removes around 1000 rows from each partition. We know how many rows were
actually removed from each partition, but we are not told how many rows the
planner thought it would remove. Variations in this unseen estimated
filtered number would lead to variations in the cost estimate.

While fixing the planner would be nice, I think the best current solution
for you is to change the order of columns in the 3-column index to be
(text, start_time, key). Since the condition on "text" is extremely
selective, the planner should know that by the time it gets to the key
column it can check it with a single hash-table probe rather than a bunch
of costly re-descents. This should be estimated to be faster than either
of the existing two indexes, and also should actually be faster.

Cheers,

Jeff