Slow planning time for simple query

Started by Jeremy Finzelalmost 8 years ago9 messagesgeneral
Jump to latest
#1Jeremy Finzel
finzelj@gmail.com

Hello -

We have an odd scenario on one of our OLTP systems, which behaves the same
way on a streamer, of a 700-1000ms planning time for a query like this:

SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

The actual execution time is sub-ms.

We initially thought maybe catalog bloat? But we were able to reindex all
of the highly churned catalog tables, and I even did VACUUM FULL on
pg_attribute and pg_statistic, to no avail.

There are no custom settings for pg_attribute for the given tables either.

Interestingly, the problem goes away on a SAN snapshot of the target system.

Any ideas of what else we could try? A PL function that caches the query
plan works, but that is just a workaround.

Thanks!
Jeremy

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeremy Finzel (#1)
Re: Slow planning time for simple query

Jeremy Finzel <finzelj@gmail.com> writes:

We have an odd scenario on one of our OLTP systems, which behaves the same
way on a streamer, of a 700-1000ms planning time for a query like this:

SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

Hm. Is this the first query executed in a session? If so maybe it's
got something to do with populating caches and other session spin-up
overhead.

Another theory is it's some sort of locking issue. Turning on
log_lock_waits, and setting deadlock_timeout to say 100ms, would help
in investigating that.

regards, tom lane

#3Jeremy Finzel
finzelj@gmail.com
In reply to: Tom Lane (#2)
Re: Slow planning time for simple query

On Wed, Jun 6, 2018 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeremy Finzel <finzelj@gmail.com> writes:

We have an odd scenario on one of our OLTP systems, which behaves the

same

way on a streamer, of a 700-1000ms planning time for a query like this:

SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

Hm. Is this the first query executed in a session? If so maybe it's
got something to do with populating caches and other session spin-up
overhead.

Another theory is it's some sort of locking issue. Turning on
log_lock_waits, and setting deadlock_timeout to say 100ms, would help
in investigating that.

regards, tom lane

I have run it over and over with no improvement in the planning time, so I
don't thing it's first in session-related. I can only make it faster with
a pl function so far.

We have log_lock_waits on and nothing shows, and turning down
deadlock_timeout also doesn't do anything.

Thanks,
Jeremy

#4Jeremy Finzel
finzelj@gmail.com
In reply to: Jeremy Finzel (#3)
Re: Slow planning time for simple query

On Wed, Jun 6, 2018 at 1:13 PM, Jeremy Finzel <finzelj@gmail.com> wrote:

On Wed, Jun 6, 2018 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeremy Finzel <finzelj@gmail.com> writes:

We have an odd scenario on one of our OLTP systems, which behaves the

same

way on a streamer, of a 700-1000ms planning time for a query like this:

SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

Hm. Is this the first query executed in a session? If so maybe it's
got something to do with populating caches and other session spin-up
overhead.

Another theory is it's some sort of locking issue. Turning on
log_lock_waits, and setting deadlock_timeout to say 100ms, would help
in investigating that.

regards, tom lane

I have run it over and over with no improvement in the planning time, so I
don't thing it's first in session-related. I can only make it faster with
a pl function so far.

We have log_lock_waits on and nothing shows, and turning down
deadlock_timeout also doesn't do anything.

Thanks,
Jeremy

Forgot to mention, this is running on this version:
PostgreSQL 9.6.8 on x86_64-pc-linux-gnu (Ubuntu 9.6.8-1.pgdg16.04+1),
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit

Thanks,
Jeremy

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jeremy Finzel (#1)
Re: Slow planning time for simple query

2018-06-06 18:59 GMT+02:00 Jeremy Finzel <finzelj@gmail.com>:

Hello -

We have an odd scenario on one of our OLTP systems, which behaves the same
way on a streamer, of a 700-1000ms planning time for a query like this:

SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

The actual execution time is sub-ms.

We initially thought maybe catalog bloat? But we were able to reindex all
of the highly churned catalog tables, and I even did VACUUM FULL on
pg_attribute and pg_statistic, to no avail.

There are no custom settings for pg_attribute for the given tables either.

Interestingly, the problem goes away on a SAN snapshot of the target
system.

Any ideas of what else we could try? A PL function that caches the query
plan works, but that is just a workaround.

more times I seen similar issue based on bloated indexes on table - pg in
planning time detect min max from possible indexes

Regards

Pavel

Show quoted text

Thanks!
Jeremy

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#5)
Re: Slow planning time for simple query

Pavel Stehule <pavel.stehule@gmail.com> writes:

2018-06-06 18:59 GMT+02:00 Jeremy Finzel <finzelj@gmail.com>:

We have an odd scenario on one of our OLTP systems, which behaves the same
way on a streamer, of a 700-1000ms planning time for a query like this:
SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

more times I seen similar issue based on bloated indexes on table - pg in
planning time detect min max from possible indexes

Oh, hmm, yeah it could be ye olde get_actual_variable_range() issue.
When this happens, are there perhaps a lot of recently-dead rows at either
extreme of the range of table1.source_id or table2.id?

We made a fix last year to improve that:
https://git.postgresql.org/gitweb/?p=postgresql.git&amp;a=commitdiff&amp;h=3ca930fc3
but it wasn't back-patched.

regards, tom lane

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Jeremy Finzel (#1)
Re: Slow planning time for simple query

On 06/06/2018 09:59 AM, Jeremy Finzel wrote:

Hello -

We have an odd scenario on one of our OLTP systems, which behaves the
same way on a streamer, of a 700-1000ms planning time for a query like this:

SELECT *
FROM table1
WHERE  source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

The actual execution time is sub-ms.

I am trying to sort out the various references to times so could you:

Show the EXPLAIN ANALYZE?

and ?

psql> \timing

psql> SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

We initially thought maybe catalog bloat?  But we were able to reindex
all of the highly churned catalog tables, and I even did VACUUM FULL on
pg_attribute and pg_statistic, to no avail.

There are no custom settings for pg_attribute for the given tables either.

Interestingly, the problem goes away on a SAN snapshot of the target system.

Any ideas of what else we could try?  A PL function that caches the
query plan works, but that is just a workaround.

Thanks!
Jeremy

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Jerry Sievers
gsievers19@comcast.net
In reply to: Tom Lane (#6)
Re: Slow planning time for simple query

Tom Lane <tgl@sss.pgh.pa.us> writes:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2018-06-06 18:59 GMT+02:00 Jeremy Finzel <finzelj@gmail.com>:

We have an odd scenario on one of our OLTP systems, which behaves the same
way on a streamer, of a 700-1000ms planning time for a query like this:
SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);

more times I seen similar issue based on bloated indexes on table - pg in
planning time detect min max from possible indexes

Oh, hmm, yeah it could be ye olde get_actual_variable_range() issue.
When this happens, are there perhaps a lot of recently-dead rows at either
extreme of the range of table1.source_id or table2.id?

We noticed the cluster of interest had a rogue physical rep slot holding
71k WAL segments.

Dropping same slot seemed to correlate with the problem going away.

Does that sound like a plausible explanation for the observed slow
planning times?

Thx

We made a fix last year to improve that:
https://git.postgresql.org/gitweb/?p=postgresql.git&amp;a=commitdiff&amp;h=3ca930fc3
but it wasn't back-patched.

regards, tom lane

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jerry Sievers (#8)
Re: Slow planning time for simple query

Jerry Sievers <gsievers19@comcast.net> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

Oh, hmm, yeah it could be ye olde get_actual_variable_range() issue.
When this happens, are there perhaps a lot of recently-dead rows at either
extreme of the range of table1.source_id or table2.id?

We noticed the cluster of interest had a rogue physical rep slot holding
71k WAL segments.
Dropping same slot seemed to correlate with the problem going away.
Does that sound like a plausible explanation for the observed slow
planning times?

I believe the slot would hold back global xmin and thereby prevent
"recently-dead" rows from becoming just plain "dead", so yeah, this
observation does seem to square with the get_actual_variable_range
theory. You'd still need to posit that something had recently deleted
a lot of rows at the end of the range of one of those columns, though.

regards, tom lane