BUG #6513: explain (analyze, buffers) and toasted fields

Started by Maxim Bogukabout 14 years ago4 messagesbugs
Jump to latest
#1Maxim Boguk
maxim.boguk@gmail.com

The following bug has been logged on the website:

Bug reference: 6513
Logged by: Maxim Boguk
Email address: maxim.boguk@gmail.com
PostgreSQL version: 9.0.7
Operating system: Linux
Description:

I got hit by that bug when explored reasons of one very slow production
query again.
And I lost a lot time trying reproduce the problem query on production
server with explain analyze.
Finally I found I need some workaround to get explain perform actual work
with toasted data.

So there is going the bug report:
EXPLAIN do not take into account that some result fields will require
detoasting and produce wrong buffers result.

Test case:

pgsql=# drop table if exists test;
DROP TABLE
Time: 277.926 ms
pgsql=# CREATE TABLE test (id integer primary key, value text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey"
for table "test"
CREATE TABLE
Time: 3.812 ms
--populate table with large toasted field
pgsql=# INSERT INTO test select i,(select array_agg(random()) from
generate_series (1,1000))::text from generate_series (1,1000) as g(i);
INSERT 0 1000
Time: 3176.286 ms

--wrong explain value at least for the buffers value... and whats even worse
wrong (100x less that would be actual) runtime
--(Buffers: shared hit=7) - wrong value
pgsql=# EXPLAIN (analyze, costs, buffers) select * from test;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..10.70 rows=1000 width=36) (actual
time=0.020..1.656 rows=1000 loops=1)
Buffers: shared hit=7
Total runtime: 3.252 ms
(3 rows)
Time: 3.704 ms

--again wrong
pgsql=# EXPLAIN (analyze, costs, buffers) select value from test;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..10.70 rows=1000 width=32) (actual
time=0.013..1.625 rows=1000 loops=1)
Buffers: shared hit=7
Total runtime: 3.141 ms
(3 rows)
Time: 3.428 ms

--force take detoasting into account via || with toasted field
--result close to reality (Buffers: shared hit=4700 - real value)
pgsql=# EXPLAIN (analyze, costs, buffers) select value||'a' from test;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..13.20 rows=1000 width=32) (actual
time=0.264..187.855 rows=1000 loops=1)
Buffers: shared hit=4700
Total runtime: 189.696 ms
(3 rows)
Time: 190.001 ms

--actual timings of the both queries the same.
pgsql=# \o /dev/null
pgsql=# select * from test;
Time: 219.845 ms
pgsql=# \o /dev/null
pgsql=# select value||'a' from test;
Time: 221.599 ms

Having correct buffer hit/read values could be critical when toasted fields
are likely to be read from the HDD.

#2Robert Haas
robertmhaas@gmail.com
In reply to: Maxim Boguk (#1)
Re: BUG #6513: explain (analyze, buffers) and toasted fields

On Sun, Mar 4, 2012 at 11:22 PM, <maxim.boguk@gmail.com> wrote:

I got hit by that bug when explored reasons of one very slow production
query again.
And I lost a lot time trying reproduce the problem query on production
server with explain analyze.
Finally I found I need some workaround to get explain perform actual work
with toasted data.

So there is going the bug report:
EXPLAIN do not take into account that some result fields will require
detoasting and produce wrong buffers result.

Well, the buffers result is, I think, not wrong. That's just a count
of how many buffers were actually read or written or whatever, and it
is what it is. I think that if there's an actual bug here, it's that
EXPLAIN ANALYZE is skipping the formation of the actual output tuples,
and therefore it's doing less work than an actual execution of the
real query would. I am not sure whether it would be a good idea to
change that or not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: BUG #6513: explain (analyze, buffers) and toasted fields

Robert Haas <robertmhaas@gmail.com> writes:

... I think that if there's an actual bug here, it's that
EXPLAIN ANALYZE is skipping the formation of the actual output tuples,
and therefore it's doing less work than an actual execution of the
real query would. I am not sure whether it would be a good idea to
change that or not.

EXPLAIN ANALYZE *necessarily* does less work than the real query,
because it doesn't transmit the results to the client (which is going
to be a dominant cost in a lot of situations). I'm not sure whether
running the I/O functions would provide a useful improvement in
verisimilitude or not; but one should never imagine that EXPLAIN is an
exact simulation of normal query execution. It's meant for debugging
planner choices, anyway, and the detoast&I/O costs are going to be the
same whichever plan you choose.

regards, tom lane

#4Marti Raudsepp
marti@juffo.org
In reply to: Tom Lane (#3)
Re: BUG #6513: explain (analyze, buffers) and toasted fields

On Mon, Mar 12, 2012 at 17:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

EXPLAIN ANALYZE *necessarily* does less work than the real query,
because it doesn't transmit the results to the client (which is going
to be a dominant cost in a lot of situations).  I'm not sure whether
running the I/O functions would provide a useful improvement in
verisimilitude or not

Granted, but I can see how one would expect the detoasting to be
accounted for in EXPLAIN, since it's something that's usually done
*inside* the database (as opposed to network overhead). That would
give a more accurate picture of where the time is being spent.

Now, another question is whether to use text or binary output
functions. And where to account the time, since this happens outside
of the plan tree.

At the very least, I think this gotcha should be documented on the EXPLAIN page.

Regards,
Marti