BUG #17330: EXPLAIN hangs and very long query plans
The following bug has been logged on the website:
Bug reference: 17330
Logged by: Strahinja Kustudic
Email address: strahinjak@nordeus.com
PostgreSQL version: 10.19
Operating system: CentOS 7
Description:
We had an issue with one of our production databases running Postgres 10.19
on CentOS 7. One of the most often executed queries started having very long
query plans of 3000ms+, while the execution duration was 1ms-3ms. Query
plans when everything is working regularly were around 1ms or less. While on
the replica it didn't even want to finish EXPLAIN (without ANALYZE, just
EXPLAIN!). EXPLAIN would just hang forever. To be precise we were running
10.10 at that time, but upgrading to 10.19 didn't help. We tried running
ANALYZE on the whole database, but that didn't help. In the end, what helped
is running pg_repack on the whole DB. This was strange because I thought
that the query planner is using table statistics and the index schema to
determine what plan to run, it shouldn't need table/index data to make a
plan, but I don't know PG internals, so I might be wrong.
Because we have backups created by WAL-G, I've restored the base backup with
WALs from that time and I've managed to recreate the issue in Docker on my
own laptop. The issue is that in Docker EXPLAIN runs forever, it never
finishes, like the issue we had on the replica. This means I have a 100%
repro of the issue and I can do whatever you tell me to do to find out what
caused it. I already tried enabling `log_error_verbosity = verbose` but the
server doesn't print anything while EXPLAIN is running and it never
finishes, I haven't tried to leave it to run for hours (but I might).
Is this a known bug in PG 10.X? Or would you like me to share more details
about the table definitions, the exact query and anything you need?
On Thu, Dec 9, 2021 at 10:28 AM PG Bug reporting form <
noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 17330
Logged by: Strahinja Kustudic
Email address: strahinjak@nordeus.com
PostgreSQL version: 10.19
Operating system: CentOS 7
Description:
The issue is that in Docker EXPLAIN runs forever, it never
finishes, like the issue we had on the replica. This means I have a 100%
repro of the issue and I can do whatever you tell me to do to find out what
caused it.
Is this 100% CPU bound? 'perf top -p <pid>' would probably be my first
tool to use here.
Also, it would be nice to know if an upgrade to a newer major version fixed
the problem.
Cheers,
Jeff
On Thu, Dec 9, 2021 at 7:28 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
We had an issue with one of our production databases running Postgres 10.19
on CentOS 7. One of the most often executed queries started having very long
query plans of 3000ms+, while the execution duration was 1ms-3ms. Query
plans when everything is working regularly were around 1ms or less. While on
the replica it didn't even want to finish EXPLAIN (without ANALYZE, just
EXPLAIN!). EXPLAIN would just hang forever. To be precise we were running
10.10 at that time, but upgrading to 10.19 didn't help. We tried running
ANALYZE on the whole database, but that didn't help. In the end, what helped
is running pg_repack on the whole DB. This was strange because I thought
that the query planner is using table statistics and the index schema to
determine what plan to run, it shouldn't need table/index data to make a
plan, but I don't know PG internals, so I might be wrong.
Sounds like a problem with get_actual_variable_range(), which can scan
indexes at plan time to determine minimum or maximum values.
This actually has been improved quite a bit since Postgres 10. So as
Jeff said, seems like you might benefit from upgrading to a newer
major version. v11 has improved things in this exact area.
--
Peter Geoghegan
On Fri, Dec 10, 2021 at 2:31 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Thu, Dec 9, 2021 at 10:28 AM PG Bug reporting form <
noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 17330
Logged by: Strahinja Kustudic
Email address: strahinjak@nordeus.com
PostgreSQL version: 10.19
Operating system: CentOS 7
Description:The issue is that in Docker EXPLAIN runs forever, it never
finishes, like the issue we had on the replica. This means I have a 100%
repro of the issue and I can do whatever you tell me to do to find out
what
caused it.Is this 100% CPU bound? 'perf top -p <pid>' would probably be my first
tool to use here.
I forgot to mention this important detail. When it happened on the
production DB it was CPU bound. It was about 70% user time and 30% system
time. There was no iowait because most of the DB was in page cache.
Check out the screenshot of the CPU usage on the production DB in the
attachment. Here is the timeline:
- 11:00 we started deleting a few million rows
- 11:34 delete finished and instantly CPU went to ~70% user and ~30% system
time
- 12:55 we configured PgBouncer to only allow just a few queries in
parallel when system time went down, but user time was still 100%
On my Docker instance when I execute EXPLAIN it starts reading a lot of
data. The indexes of the biggest table the query reads are 50GB, so my
guess is that it reads those indexes.
I also allowed EXPLAIN in Docker to finish and it took almost 500 seconds.
Can the index corruption cause this?
Regards,
Strahinja
Attachments:
Screenshot-20211210-100111.jpgimage/jpeg; name=Screenshot-20211210-100111.jpgDownload+1-0
On Fri, Dec 10, 2021 at 2:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
Sounds like a problem with get_actual_variable_range(), which can scan
indexes at plan time to determine minimum or maximum values.This actually has been improved quite a bit since Postgres 10. So as
Jeff said, seems like you might benefit from upgrading to a newer
major version. v11 has improved things in this exact area.
On my Docker instance when I execute EXPLAIN it starts reading a lot of
data. The indexes of the biggest table the query reads are 50GB, so my
guess is that it reads those indexes.
I allowed EXPLAIN in Docker to finish and it took almost 500 seconds and it
was reading data all the time. After I reindexed the biggest table, EXPLAIN
finished instantly. Can the index corruption cause this?
Note that this started happening in production after we deleted a few
million rows from the biggest table.
Regards,
Strahinja
Hi
pá 10. 12. 2021 v 10:25 odesílatel Strahinja Kustudić <
strahinjak@nordeus.com> napsal:
On Fri, Dec 10, 2021 at 2:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
Sounds like a problem with get_actual_variable_range(), which can scan
indexes at plan time to determine minimum or maximum values.This actually has been improved quite a bit since Postgres 10. So as
Jeff said, seems like you might benefit from upgrading to a newer
major version. v11 has improved things in this exact area.On my Docker instance when I execute EXPLAIN it starts reading a lot of
data. The indexes of the biggest table the query reads are 50GB, so my
guess is that it reads those indexes.I allowed EXPLAIN in Docker to finish and it took almost 500 seconds and
it was reading data all the time. After I reindexed the biggest table,
EXPLAIN finished instantly. Can the index corruption cause this?
The index can be bloated.
Note that this started happening in production after we deleted a few
million rows from the biggest table.
it is maybe same issue
/messages/by-id/9ADAF560-F331-4AF8-A28A-F005070C3187@yandex.ru
Regards
Pavel
Show quoted text
Regards,
Strahinja
On Fri, Dec 10, 2021 at 10:53 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:
Hi
pá 10. 12. 2021 v 10:25 odesílatel Strahinja Kustudić <
strahinjak@nordeus.com> napsal:On Fri, Dec 10, 2021 at 2:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
Sounds like a problem with get_actual_variable_range(), which can scan
indexes at plan time to determine minimum or maximum values.This actually has been improved quite a bit since Postgres 10. So as
Jeff said, seems like you might benefit from upgrading to a newer
major version. v11 has improved things in this exact area.On my Docker instance when I execute EXPLAIN it starts reading a lot of
data. The indexes of the biggest table the query reads are 50GB, so my
guess is that it reads those indexes.I allowed EXPLAIN in Docker to finish and it took almost 500 seconds and
it was reading data all the time. After I reindexed the biggest table,
EXPLAIN finished instantly. Can the index corruption cause this?The index can be bloated.
Is bloat just enough to make it slower, or the index needs to be corrupted?
Note that this started happening in production after we deleted a few
million rows from the biggest table.it is maybe same issue
/messages/by-id/9ADAF560-F331-4AF8-A28A-F005070C3187@yandex.ru
Looks similar because there were deletes after which query plans went to a
few seconds.
I just upgraded Postgres to 12 with pg_upgrade, did ANALYZE of the whole DB
and the issue is similar, but EXPLAIN now finishes in 19 seconds instead of
500 seconds, which means the issue is still there. Here is the whole
EXPLAIN ANALYZE (I added ANALYZE so that it displays planning duration):
Sort (cost=1946.04..1946.16 rows=46 width=642) (actual time=10.929..10.932
rows=20 loops=1)
Sort Key: m.delivery_start DESC
Sort Method: quicksort Memory: 44kB
-> Nested Loop Left Join (cost=10.19..1944.77 rows=46 width=642)
(actual time=2.305..10.401 rows=20 loops=1)
Join Filter: ((c.game_name = csu.game_name) AND (um.user_id =
csu.user_id) AND (c.id = csu.campaign_id))
Filter: ((c.id IS NULL) OR (c.active AND (now() >=
c.campaign_start) AND ((now() <= c.campaign_end) OR (c.campaign_end IS
NULL)) AND ((c.segment_ids <@
'{5632,-2,2,5634,9731,5640,5641,9738,13,3599,5648,5655,13343,5
664,13344,13345,13346,13347,13348,13350,13351,13352,5673,13353,13354,1067,13355,13356,13357,13358,13359,5680,13360,6705,13361,13362,13363,13364,13365,13366,5687,13367,13368,13369,20538,13370,13371,13372,13373,6718,13374,190
07,19008,13378,5699,13379,13380,19013,3142,3143,19015,13385,5707,6731,20555,19019,3149,19021,3150,591,19023,592,19024,6737,19027,19028,5717,6743,17496,19033,13402,19035,6750,17502,19039,17504,19041,19043,19044,17508,6757,17
510,19047,104,19048,17512,17514,7788,6764,621,19053,623,19055,7792,625,19059,7796,1141,19061,19063,7800,19064,5241,19067,19068,3713,19073,1667,19075,1668,19079,19081,19083,19084,5262,19087,19088,8338,8339,7828,19093,20631,1
9095,20632,20633,12955,19099,19101,19103,19104,19107,19108,19113,19115,8876,19119,3760,19121,19123,19124,3785,4814,16605,4324,16622,16623,21253,21254,4359,21258,5901,5915,4900,22832,22835,22836,9013,3895,3902,1865,1866,1379
,1380,4978,7027,13685,14203,2428,2429,21887,21888,4485,22932,22933,21911,21417,432,434,20418,968,969,470,7642,991,992,10209,1512,1513,23024,9201,9207,23032,9208,9214}'::integer[])
OR (csu.user_id = '-XXXXXXXXXXXXXXXXXX'::b
igint))))
Rows Removed by Filter: 11
-> Nested Loop (cost=10.03..1910.89 rows=163 width=696) (actual
time=2.029..10.092 rows=31 loops=1)
-> Hash Left Join (cost=9.46..695.80 rows=61 width=684)
(actual time=0.729..5.991 rows=55 loops=1)
Hash Cond: ((m.game_name = c.game_name) AND
(m.campaign_id = c.id))
-> Index Scan using mail_pkey on mail m
(cost=0.27..686.29 rows=61 width=552) (actual time=0.297..5.524 rows=55
loops=1)
Filter: (active AND (game_name =
'topeleven'::game_name) AND (now() >= delivery_start) AND ((now() <=
delivery_end) OR (delivery_end IS NULL)))
Rows Removed by Filter: 97
-> Hash (cost=7.83..7.83 rows=91 width=136) (actual
time=0.411..0.412 rows=91 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Seq Scan on campaign c (cost=0.00..7.83
rows=91 width=136) (actual time=0.265..0.371 rows=91 loops=1)
Filter: (game_name =
'topeleven'::game_name)
Rows Removed by Filter: 55
-> Index Scan using user_mail_pkey on user_mail um
(cost=0.57..19.88 rows=4 width=24) (actual time=0.074..0.074 rows=1
loops=55)
Index Cond: ((user_id = '-XXXXXXXXXXXXXXXXXX'::bigint)
AND (game_name = 'topeleven'::game_name) AND (mail_id = m.id))
Filter: ((m.mail_type <> 'question'::mail_type) OR
(mail_status <> 'completed'::mail_status))
-> Materialize (cost=0.15..28.18 rows=1 width=16) (actual
time=0.009..0.009 rows=0 loops=31)
-> Index Only Scan using campaign_static_users_pkey on
campaign_static_users csu (cost=0.15..28.17 rows=1 width=16) (actual
time=0.266..0.266 rows=0 loops=1)
Index Cond: ((user_id = '-XXXXXXXXXXXXXXXXXX'::bigint)
AND (game_name = 'topeleven'::game_name))
Heap Fetches: 0
Planning Time: 19408.862 ms
Execution Time: 10.978 ms
(27 rows)
Just to mention again, reindexing the biggest table fixes the issue, but
I'm more curious why did this happen and how can we prevent it from
happening again? Or is it already fixed so that it cannot happen in the
future?
Regards,
Strahinja
pá 10. 12. 2021 v 16:52 odesílatel Strahinja Kustudić <
strahinjak@nordeus.com> napsal:
On Fri, Dec 10, 2021 at 10:53 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:Hi
pá 10. 12. 2021 v 10:25 odesílatel Strahinja Kustudić <
strahinjak@nordeus.com> napsal:On Fri, Dec 10, 2021 at 2:36 AM Peter Geoghegan <pg@bowt.ie> wrote:
Sounds like a problem with get_actual_variable_range(), which can scan
indexes at plan time to determine minimum or maximum values.This actually has been improved quite a bit since Postgres 10. So as
Jeff said, seems like you might benefit from upgrading to a newer
major version. v11 has improved things in this exact area.On my Docker instance when I execute EXPLAIN it starts reading a lot of
data. The indexes of the biggest table the query reads are 50GB, so my
guess is that it reads those indexes.I allowed EXPLAIN in Docker to finish and it took almost 500 seconds and
it was reading data all the time. After I reindexed the biggest table,
EXPLAIN finished instantly. Can the index corruption cause this?The index can be bloated.
Is bloat just enough to make it slower, or the index needs to be corrupted?
corruption and bloatings are two very different things. Bloated index is
slower. Corrupted index returns bad results.
Note that this started happening in production after we deleted a few
million rows from the biggest table.it is maybe same issue
/messages/by-id/9ADAF560-F331-4AF8-A28A-F005070C3187@yandex.ru
Looks similar because there were deletes after which query plans went to a
few seconds.I just upgraded Postgres to 12 with pg_upgrade, did ANALYZE of the whole
DB and the issue is similar, but EXPLAIN now finishes in 19 seconds instead
of 500 seconds, which means the issue is still there. Here is the whole
EXPLAIN ANALYZE (I added ANALYZE so that it displays planning duration):Sort (cost=1946.04..1946.16 rows=46 width=642) (actual
time=10.929..10.932 rows=20 loops=1)
Sort Key: m.delivery_start DESC
Sort Method: quicksort Memory: 44kB
-> Nested Loop Left Join (cost=10.19..1944.77 rows=46 width=642)
(actual time=2.305..10.401 rows=20 loops=1)
Join Filter: ((c.game_name = csu.game_name) AND (um.user_id =
csu.user_id) AND (c.id = csu.campaign_id))
Filter: ((c.id IS NULL) OR (c.active AND (now() >=
c.campaign_start) AND ((now() <= c.campaign_end) OR (c.campaign_end IS
NULL)) AND ((c.segment_ids <@
'{5632,-2,2,5634,9731,5640,5641,9738,13,3599,5648,5655,13343,5664,13344,13345,13346,13347,13348,13350,13351,13352,5673,13353,13354,1067,13355,13356,13357,13358,13359,5680,13360,6705,13361,13362,13363,13364,13365,13366,5687,13367,13368,13369,20538,13370,13371,13372,13373,6718,13374,190
07,19008,13378,5699,13379,13380,19013,3142,3143,19015,13385,5707,6731,20555,19019,3149,19021,3150,591,19023,592,19024,6737,19027,19028,5717,6743,17496,19033,13402,19035,6750,17502,19039,17504,19041,19043,19044,17508,6757,17
510,19047,104,19048,17512,17514,7788,6764,621,19053,623,19055,7792,625,19059,7796,1141,19061,19063,7800,19064,5241,19067,19068,3713,19073,1667,19075,1668,19079,19081,19083,19084,5262,19087,19088,8338,8339,7828,19093,20631,1
9095,20632,20633,12955,19099,19101,19103,19104,19107,19108,19113,19115,8876,19119,3760,19121,19123,19124,3785,4814,16605,4324,16622,16623,21253,21254,4359,21258,5901,5915,4900,22832,22835,22836,9013,3895,3902,1865,1866,1379
,1380,4978,7027,13685,14203,2428,2429,21887,21888,4485,22932,22933,21911,21417,432,434,20418,968,969,470,7642,991,992,10209,1512,1513,23024,9201,9207,23032,9208,9214}'::integer[])
OR (csu.user_id = '-XXXXXXXXXXXXXXXXXX'::b
igint))))
Rows Removed by Filter: 11
-> Nested Loop (cost=10.03..1910.89 rows=163 width=696) (actual
time=2.029..10.092 rows=31 loops=1)
-> Hash Left Join (cost=9.46..695.80 rows=61 width=684)
(actual time=0.729..5.991 rows=55 loops=1)
Hash Cond: ((m.game_name = c.game_name) AND
(m.campaign_id = c.id))
-> Index Scan using mail_pkey on mail m
(cost=0.27..686.29 rows=61 width=552) (actual time=0.297..5.524 rows=55
loops=1)
Filter: (active AND (game_name =
'topeleven'::game_name) AND (now() >= delivery_start) AND ((now() <=
delivery_end) OR (delivery_end IS NULL)))
Rows Removed by Filter: 97
-> Hash (cost=7.83..7.83 rows=91 width=136) (actual
time=0.411..0.412 rows=91 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Seq Scan on campaign c (cost=0.00..7.83
rows=91 width=136) (actual time=0.265..0.371 rows=91 loops=1)
Filter: (game_name =
'topeleven'::game_name)
Rows Removed by Filter: 55
-> Index Scan using user_mail_pkey on user_mail um
(cost=0.57..19.88 rows=4 width=24) (actual time=0.074..0.074 rows=1
loops=55)
Index Cond: ((user_id =
'-XXXXXXXXXXXXXXXXXX'::bigint) AND (game_name = 'topeleven'::game_name) AND
(mail_id = m.id))
Filter: ((m.mail_type <> 'question'::mail_type) OR
(mail_status <> 'completed'::mail_status))
-> Materialize (cost=0.15..28.18 rows=1 width=16) (actual
time=0.009..0.009 rows=0 loops=31)
-> Index Only Scan using campaign_static_users_pkey on
campaign_static_users csu (cost=0.15..28.17 rows=1 width=16) (actual
time=0.266..0.266 rows=0 loops=1)
Index Cond: ((user_id =
'-XXXXXXXXXXXXXXXXXX'::bigint) AND (game_name = 'topeleven'::game_name))
Heap Fetches: 0
Planning Time: 19408.862 ms
Execution Time: 10.978 ms
(27 rows)Just to mention again, reindexing the biggest table fixes the issue, but
I'm more curious why did this happen and how can we prevent it from
happening again? Or is it already fixed so that it cannot happen in the
future?
Maybe VACUUM can solve this issue too.
Massive delete is a common issue - you can try to use partitioning and then
istead massive delete, use DROP partition.
Regards
Pavel
Show quoted text
Regards,
Strahinja
=?UTF-8?Q?Strahinja_Kustudi=C4=87?= <strahinjak@nordeus.com> writes:
On Fri, Dec 10, 2021 at 10:53 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:pá 10. 12. 2021 v 10:25 odesílatel Strahinja Kustudić <
strahinjak@nordeus.com> napsal:I allowed EXPLAIN in Docker to finish and it took almost 500 seconds and
it was reading data all the time. After I reindexed the biggest table,
EXPLAIN finished instantly. Can the index corruption cause this?
The index can be bloated.
Is bloat just enough to make it slower, or the index needs to be corrupted?
bloat != corruption. The problem you're dealing with here is a whole lot
of dead index entries at the very end of the index range (could be either
the lowest or highest entries). The planner is wasting time scanning
through those to find the extremal non-dead value for a range estimate.
REINDEX got rid of the dead entries; a VACUUM would have done as well.
As noted upthread, more recent PG versions are a bit smarter about
such cases.
regards, tom lane
On Fri, Dec 10, 2021 at 5:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
bloat != corruption. The problem you're dealing with here is a whole lot
of dead index entries at the very end of the index range (could be either
the lowest or highest entries). The planner is wasting time scanning
through those to find the extremal non-dead value for a range estimate.
REINDEX got rid of the dead entries; a VACUUM would have done as well.
I completely understand that bloat is different from corruption. It was
just really strange that some mass deletes caused this now because we have
mass deletes regularly on this DB and it has been running for a few years
without any issues. You're right about VACUUM also fixing it, I just tried
it inside Docker and after VACUUM finished it instantly fixed the issue.
The downside is that VACUUM took a long time to finish.
As noted upthread, more recent PG versions are a bit smarter about
such cases.
As far as I understand there is not much we can do except upgrade to future
major versions, because on this DB we cannot partition data in that way to
be able to drop old partitions.
Thanks again for your help :)
Regards,
Strahinja