WIP: explain analyze with 'rows' but not timing

Started by Tomas Vondraover 14 years ago12 messageshackers
Jump to latest
#1Tomas Vondra
tomas.vondra@2ndquadrant.com

Hi all,

most of the time I use auto_explain, all I need is duration of the query
and the plan with estimates and actual row counts. And it would be handy
to be able to catch long running queries with estimates that are
significantly off (say 100x lower or higher compared to actual row numbers).

The gettimeofday() calls are not exactly cheap in some cases, so why to
pay that price when all you need is the number of rows?

The patch attached does this:

1) adds INSTRUMENT_ROWS, a new InstrumentOption

- counts rows without timing (no gettimeofday() callse)
- if you want timing info, use INSTRUMENT_TIMER

2) adds new option "TIMING" to EXPLAIN, i.e.

EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...

3) adds auto_explain.log_rows_only (false by default)

- if you set this to 'true', then the instrumentation will just
count rows, without calling gettimeofday()

It works quite well, except one tiny issue - when the log_rows_only is
set to false (so that auto_explain requires timing), it silently
overrides the EXPLAIN option. So that even when the user explicitly
disables timing (TIMING OFF), it's overwritten and the explain collects
the timing data.

I could probably hide the timing info, but that'd make the issue even
worse (the user would not notice that the timing was actually enabled).

Maybe the right thing would be to explicitly disable timing for queries
executed with "EXPLAIN (TIMING OFF)". Any other ideas how to make this
work reasonably?

The patch does not implement any checks (how far is the estimate from
the reality) yet, that'll be round two.

regards
Tomas

Attachments:

explain-analyze-rows-only.difftext/plain; name=explain-analyze-rows-only.diffDownload+69-21
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tomas Vondra (#1)
Re: WIP: explain analyze with 'rows' but not timing

Hello

2011/12/23 Tomas Vondra <tv@fuzzy.cz>:

Hi all,

most of the time I use auto_explain, all I need is duration of the query
and the plan with estimates and actual row counts. And it would be handy
to be able to catch long running queries with estimates that are
significantly off (say 100x lower or higher compared to actual row numbers).

The gettimeofday() calls are not exactly cheap in some cases, so why to
pay that price when all you need is the number of rows?

The patch attached does this:

1) adds INSTRUMENT_ROWS, a new InstrumentOption

  - counts rows without timing (no gettimeofday() callse)
  - if you want timing info, use INSTRUMENT_TIMER

2) adds new option "TIMING" to EXPLAIN, i.e.

   EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...

3) adds auto_explain.log_rows_only (false by default)

  - if you set this to 'true', then the instrumentation will just
    count rows, without calling gettimeofday()

It works quite well, except one tiny issue - when the log_rows_only is
set to false (so that auto_explain requires timing), it silently
overrides the EXPLAIN option. So that even when the user explicitly
disables timing (TIMING OFF), it's overwritten and the explain collects
the timing data.

I could probably hide the timing info, but that'd make the issue even
worse (the user would not notice that the timing was actually enabled).

Maybe the right thing would be to explicitly disable timing for queries
executed with "EXPLAIN (TIMING OFF)". Any other ideas how to make this
work reasonably?

The patch does not implement any checks (how far is the estimate from
the reality) yet, that'll be round two.

It is interesting idea - but maybe we can have a have a different
metric than time - this is very unstable quantity - mainly on
production overloaded servers.

It is good idea - we need a tool for bad statistic searching that is
relative cheap.

Regards

Pavel

Show quoted text

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#1)
Re: WIP: explain analyze with 'rows' but not timing

2011/12/22 Tomas Vondra <tv@fuzzy.cz>:

The gettimeofday() calls are not exactly cheap in some cases, so why to
pay that price when all you need is the number of rows?

Fair point.

The patch attached does this:

1) adds INSTRUMENT_ROWS, a new InstrumentOption

  - counts rows without timing (no gettimeofday() callse)
  - if you want timing info, use INSTRUMENT_TIMER

2) adds new option "TIMING" to EXPLAIN, i.e.

   EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...

3) adds auto_explain.log_rows_only (false by default)

  - if you set this to 'true', then the instrumentation will just
    count rows, without calling gettimeofday()

This seems like an unnecessarily confusing interface, because you've
named the auto_explain option differently from the EXPLAIN option and
given it (almost) the opposite sense: timing=off means the same thing
as log_rows_only=on.

I think the EXPLAIN (TIMING) option is good the way you have it, but
then just have auto_explain.log_timing, with a default value of on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#3)
Re: WIP: explain analyze with 'rows' but not timing

On 23.12.2011 14:57, Robert Haas wrote:

2011/12/22 Tomas Vondra <tv@fuzzy.cz>:

The gettimeofday() calls are not exactly cheap in some cases, so why to
pay that price when all you need is the number of rows?

Fair point.

The patch attached does this:

1) adds INSTRUMENT_ROWS, a new InstrumentOption

- counts rows without timing (no gettimeofday() callse)
- if you want timing info, use INSTRUMENT_TIMER

2) adds new option "TIMING" to EXPLAIN, i.e.

EXPLAIN (ANALYZE ON, TIMING ON) SELECT ...

3) adds auto_explain.log_rows_only (false by default)

- if you set this to 'true', then the instrumentation will just
count rows, without calling gettimeofday()

This seems like an unnecessarily confusing interface, because you've
named the auto_explain option differently from the EXPLAIN option and
given it (almost) the opposite sense: timing=off means the same thing
as log_rows_only=on.

I think the EXPLAIN (TIMING) option is good the way you have it, but
then just have auto_explain.log_timing, with a default value of on.

Makes sense. I've updated the patch to reflect this, so the option is
called auto_explain.log_timing and is true by default. I'll add the
patch to the next commit fest.

One thing I'm wondering about is that the InstrumentOptions are not
exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while
INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to
redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e.
to get the current behaviour, you'd have to do this

instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS)

It's quite trivial change in explain.c, the problem I have with that is
that it might break extensions.

Tomas

Attachments:

explain-analyze-rows-only-v2.difftext/plain; name=explain-analyze-rows-only-v2.diffDownload+91-22
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#4)
Re: WIP: explain analyze with 'rows' but not timing

Tomas Vondra <tv@fuzzy.cz> writes:

One thing I'm wondering about is that the InstrumentOptions are not
exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while
INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to
redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e.
to get the current behaviour, you'd have to do this

instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS)

It's quite trivial change in explain.c, the problem I have with that is
that it might break extensions.

I'm not especially concerned by that angle --- we make bigger API
changes all the time. But you could change the name, eg

instrument_options |= (INSTRUMENT_TIMING | INSTRUMENT_ROWS)

and then #define INSTRUMENT_TIMER as the OR of the two real bits
for backward compatibility.

regards, tom lane

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#4)
Re: WIP: explain analyze with 'rows' but not timing

Excerpts from Tomas Vondra's message of vie dic 23 11:45:40 -0300 2011:

One thing I'm wondering about is that the InstrumentOptions are not
exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while
INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to
redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e.
to get the current behaviour, you'd have to do this

instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS)

It's quite trivial change in explain.c, the problem I have with that is
that it might break extensions.

No, maybe add INSTRUMENT_TIMER_ONLY and then define INSTRUMENT_TIMER as
setting both. That way you don't break compatibility.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: WIP: explain analyze with 'rows' but not timing

On 23.12.2011 16:14, Tom Lane wrote:

Tomas Vondra <tv@fuzzy.cz> writes:

One thing I'm wondering about is that the InstrumentOptions are not
exclusive - INSTRUMENT_TIMER means 'collect timing and row counts' while
INSTRUMENT_ROWS means 'collect row counts'. Wouldn't it be better to
redefine the INSTRUMENT_TIMER so that it collects just timing info. I.e.
to get the current behaviour, you'd have to do this

instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS)

It's quite trivial change in explain.c, the problem I have with that is
that it might break extensions.

I'm not especially concerned by that angle --- we make bigger API
changes all the time. But you could change the name, eg

instrument_options |= (INSTRUMENT_TIMING | INSTRUMENT_ROWS)

and then #define INSTRUMENT_TIMER as the OR of the two real bits
for backward compatibility.

OK, that seems like a good solution. But is it worth the additional
complexity in explain.c?

The motivation for this patch was that collection timing data often
causes performance issues and in some cases it's not needed. But is this
true for row counts?

Are there machines where collecting row counts is above noise level?
I've never seen that, but that's not a proof of nonexistence.

If the overhead of this is negligible, then I could just hide the row
counts from the output.

Tomas

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#7)
Re: WIP: explain analyze with 'rows' but not timing

Tomas Vondra <tv@fuzzy.cz> writes:

The motivation for this patch was that collection timing data often
causes performance issues and in some cases it's not needed. But is this
true for row counts?

Perhaps more to the point, is there a use case for collecting timing
data without row counts? I find it hard to visualize a valid reason.

regards, tom lane

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#8)
Re: WIP: explain analyze with 'rows' but not timing

2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:

Tomas Vondra <tv@fuzzy.cz> writes:

The motivation for this patch was that collection timing data often
causes performance issues and in some cases it's not needed. But is this
true for row counts?

Perhaps more to the point, is there a use case for collecting timing
data without row counts?  I find it hard to visualize a valid reason.

yes - a searching of bad prediction

Regards

Pavel

Show quoted text

                       regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#9)
Re: WIP: explain analyze with 'rows' but not timing

Pavel Stehule <pavel.stehule@gmail.com> writes:

2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:

Tomas Vondra <tv@fuzzy.cz> writes:

The motivation for this patch was that collection timing data often
causes performance issues and in some cases it's not needed. But is this
true for row counts?

Perhaps more to the point, is there a use case for collecting timing
data without row counts? I find it hard to visualize a valid reason.

yes - a searching of bad prediction

No, because timing alone proves nothing at all. The machine could just
have been overloaded.

regards, tom lane

#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Pavel Stehule (#9)
Re: WIP: explain analyze with 'rows' but not timing

Dne 23.12.2011 22:37, Pavel Stehule napsal(a):

2011/12/23 Tom Lane<tgl@sss.pgh.pa.us>:

Tomas Vondra<tv@fuzzy.cz> writes:

The motivation for this patch was that collection timing data often
causes performance issues and in some cases it's not needed. But is this
true for row counts?

Perhaps more to the point, is there a use case for collecting timing
data without row counts? I find it hard to visualize a valid reason.

yes - a searching of bad prediction

But that's the purpose of collecting row counts without timing data.

TL is asking about the opposite case - collecting timing data without
row counts. I can't imagine such use case ...

Tomas

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#10)
Re: WIP: explain analyze with 'rows' but not timing

2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2011/12/23 Tom Lane <tgl@sss.pgh.pa.us>:

Tomas Vondra <tv@fuzzy.cz> writes:

The motivation for this patch was that collection timing data often
causes performance issues and in some cases it's not needed. But is this
true for row counts?

Perhaps more to the point, is there a use case for collecting timing
data without row counts?  I find it hard to visualize a valid reason.

yes - a searching of bad prediction

No, because timing alone proves nothing at all.  The machine could just
have been overloaded.

sorry, I didn't understand to question.

Using only time is not practical

Regards

Pavel

Show quoted text

                       regards, tom lane