Avoiding out of date statistics / planner

Started by Tim Kaneabout 6 years ago5 messagesgeneral
Jump to latest
#1Tim Kane
tim.kane@gmail.com

Every now and again, I will encounter an unexplained long-running query.

It’s a head scratcher moment, because this query that is still running for
20 minutes (not blocking) can be run independently in about 500ms

I can only assume that the problem query ran against the table(s) at a time
when it was perhaps in need of a vacuum analyze... I’m guessing here, that
the table had seen some amount of change and simply had out of date
statistics.

How can I avoid this?
The auto-vacuum daemon is doing it’s thing, but there is always going to be
an opportunity for a query to sneak in against a table that has recently
seen large change, but not yet been analysed.

On the application side, we can explicitly issue a VACUUM ANALYZE after
each bulk operation - and often that is precisely what happens..

But - I am keenly aware that this cannot be performed within a transaction.
That means there is always a (small) window in which a query can still
execute in this scenario.

Are there any other best practices that can mitigate this kind of problem?

It’s rare, sure - but I don’t like sweeping these under the rug.

I’m on PG 9.6.. perhaps there are planner improvements since then that
might reduce the incidence of these (rare) issues.

Any advice appreciated, thanks.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tim Kane (#1)
Re: Avoiding out of date statistics / planner

Tim Kane <tim.kane@gmail.com> writes:

Every now and again, I will encounter an unexplained long-running query.
It’s a head scratcher moment, because this query that is still running for
20 minutes (not blocking) can be run independently in about 500ms

Without some kind of context (like, have you been doing something to
the table(s) involved that would drastically change their statistics)
it's hard to comment on this. It's not obvious from the info
provided that this is a bad-plan issue rather than something else.

On the application side, we can explicitly issue a VACUUM ANALYZE after
each bulk operation - and often that is precisely what happens..
But - I am keenly aware that this cannot be performed within a transaction.

Plain ANALYZE can be, and that's all you need if the problem is to
update stats.

regards, tom lane

#3Michael Lewis
mlewis@entrata.com
In reply to: Tom Lane (#2)
Re: Avoiding out of date statistics / planner

It may also be worth noting that it is possible to make autovacuum/analyze
more aggressive, perhaps only on the tables that see large changes in data
that might result in a statistics issue. If you could share a query,
explain analyze output, and pseudo code or at least description of what
sort of bulk operations are being done, then more insight could be offered.

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Michael Lewis (#3)
Re: Avoiding out of date statistics / planner

On Wed, Feb 12, 2020 at 10:23:22AM -0700, Michael Lewis wrote:

It may also be worth noting that it is possible to make autovacuum/analyze
more aggressive, perhaps only on the tables that see large changes in data
that might result in a statistics issue. If you could share a query,
explain analyze output, and pseudo code or at least description of what
sort of bulk operations are being done, then more insight could be offered.

Another thing you can do is deploy auto_explain, and log explain plan
for long-runnning queries. That won't fix the root cause, but it will
help you with confirming the root cause - you'll see the query plan,
which should give you enough context.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tim Kane
tim.kane@gmail.com
In reply to: Tomas Vondra (#4)
Re: Avoiding out of date statistics / planner

That looks very useful indeed. Thanks Tomas

On Wed, Feb 12, 2020 at 8:32 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

Show quoted text

On Wed, Feb 12, 2020 at 10:23:22AM -0700, Michael Lewis wrote:

It may also be worth noting that it is possible to make autovacuum/analyze
more aggressive, perhaps only on the tables that see large changes in data
that might result in a statistics issue. If you could share a query,
explain analyze output, and pseudo code or at least description of what
sort of bulk operations are being done, then more insight could be

offered.

Another thing you can do is deploy auto_explain, and log explain plan
for long-runnning queries. That won't fix the root cause, but it will
help you with confirming the root cause - you'll see the query plan,
which should give you enough context.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services