pgsql: Augment EXPLAIN output with more details on Hash nodes.
Log Message:
-----------
Augment EXPLAIN output with more details on Hash nodes.
We show the number of buckets, the number of batches (and also the original
number if it has changed), and the peak space used by the hash table. Minor
executor changes to track peak space used.
Modified Files:
--------------
pgsql/src/backend/commands:
explain.c (r1.199 -> r1.200)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/explain.c?r1=1.199&r2=1.200)
pgsql/src/backend/executor:
nodeHash.c (r1.126 -> r1.127)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeHash.c?r1=1.126&r2=1.127)
pgsql/src/include/executor:
hashjoin.h (r1.52 -> r1.53)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/executor/hashjoin.h?r1=1.52&r2=1.53)
rhaas@postgresql.org (Robert Haas) writes:
Log Message:
-----------
Augment EXPLAIN output with more details on Hash nodes.
This needs to be damped down a bit. It should not print useless
non-information in cases where the plan wasn't actually run.
Please compare show_sort_info.
regards, tom lane
On Mon, Feb 1, 2010 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
rhaas@postgresql.org (Robert Haas) writes:
Log Message:
-----------
Augment EXPLAIN output with more details on Hash nodes.This needs to be damped down a bit. It should not print useless
non-information in cases where the plan wasn't actually run.
Please compare show_sort_info.
Eh? When does it do that?
rhaas=# prepare foo as select * from pg_attribute a, pg_class c where
a.attrelid = c.oid and $1;
PREPARE
rhaas=# explain analyze execute foo (true);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Result (cost=12.76..95.19 rows=1955 width=357) (actual
time=1.972..21.777 rows=1973 loops=1)
One-Time Filter: $1
-> Hash Join (cost=12.76..95.19 rows=1955 width=357) (actual
time=1.960..15.679 rows=1973 loops=1)
Hash Cond: (a.attrelid = c.oid)
-> Seq Scan on pg_attribute a (cost=0.00..55.55 rows=1955
width=167) (actual time=0.013..2.902 rows=1973 loops=1)
-> Hash (cost=9.56..9.56 rows=256 width=194) (actual
time=1.828..1.828 rows=259 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 49kB
-> Seq Scan on pg_class c (cost=0.00..9.56 rows=256
width=194) (actual time=0.020..0.835 rows=259 loops=1)
Total runtime: 24.746 ms
(9 rows)
rhaas=# explain analyze execute foo (false);
QUERY PLAN
-------------------------------------------------------------------------------------------------
Result (cost=12.76..95.19 rows=1955 width=357) (actual
time=0.003..0.003 rows=0 loops=1)
One-Time Filter: $1
-> Hash Join (cost=12.76..95.19 rows=1955 width=357) (never executed)
Hash Cond: (a.attrelid = c.oid)
-> Seq Scan on pg_attribute a (cost=0.00..55.55 rows=1955
width=167) (never executed)
-> Hash (cost=9.56..9.56 rows=256 width=194) (never executed)
-> Seq Scan on pg_class c (cost=0.00..9.56 rows=256
width=194) (never executed)
Total runtime: 0.292 ms
(8 rows)
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Feb 1, 2010 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This needs to be damped down a bit. �It should not print useless
non-information in cases where the plan wasn't actually run.
Please compare show_sort_info.
Eh? When does it do that?
Oh, I'm sorry, it's using hashtable existence to condition the whole
output. So my complaint is backwards. I thought the intention was
to print the estimated number of batches in all cases, and then the
actual as well in EXPLAIN ANALYZE.
BTW, I think "estimated" and "actual" would be less confusing
terminology than "original".
regards, tom lane
On Mon, Feb 1, 2010 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Feb 1, 2010 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This needs to be damped down a bit. It should not print useless
non-information in cases where the plan wasn't actually run.
Please compare show_sort_info.Eh? When does it do that?
Oh, I'm sorry, it's using hashtable existence to condition the whole
output. So my complaint is backwards. I thought the intention was
to print the estimated number of batches in all cases, and then the
actual as well in EXPLAIN ANALYZE.BTW, I think "estimated" and "actual" would be less confusing
terminology than "original".
I think (but am not 100% sure) that the number that is computed during
the plan phase is acually thrown away and recomputed during the
execution phase (grep for ExecChooseHashTableSize). So potentially
there is:
A. the number of buckets and batches estimated during planning
B. the number of buckets and batches decided on at the beginning of execution
C. the number of batches we end up using as a result of work_mem overflow
Right now I'm just printing out B and C; we could add A as well, but I
think there are some changes needed to hold on to that information for
longer than we presently do. At any rate, the terminology we settle
on should be able to accommodate potentially dumping out all of these
values.
IMO, it's not worth spending an enormous amount of time on this. The
most important questions that I think people will want to answer are
(1) was this done as a multi-batch hash join?, (b) if so, did it start
out that way or did nbatch increase on the fly?, and (3) how close was
I to overflowing work_mem? I'm happy to make improvements, but I
don't think we should get too crazy.
...Robert