further explain changes
Per recent discussion on pgsql-performance, and per discussion on
-hackers that it might not be too late for small patches after all,
here is a patch (as yet without documentation) which adds some
additional instrumentation to EXPLAIN for hashes: number of buckets,
number of batches, original number of batches, and peak memory
utilization. Thoughts?
I was also thinking about the possibility of adding a new option
called "output" and making that control whether the "Output" line gets
printed. It's kind of annoying to use EXPLAIN (ANALYZE, VERBOSE)
right now (and moreso with this patch) specifically because of that
line, which is quite... verbose. If we're going to change it ever we
should do it for 9.0, since we've made a lot of other changes that
people will be adjusting for anyhow.
Also, do we want to change the schema URL? The existing URL was
suggested by Peter but IIRC there was some thought that it might not
be the best choice.
http://www.postgresql.org/2009/explain
...Robert
Attachments:
explain-hash.patchtext/x-patch; charset=US-ASCII; name=explain-hash.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 18ddeec..fca9ba4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -20,6 +20,7 @@
#include "commands/explain.h"
#include "commands/prepare.h"
#include "commands/trigger.h"
+#include "executor/hashjoin.h"
#include "executor/instrument.h"
#include "optimizer/clauses.h"
#include "optimizer/planner.h"
@@ -67,6 +68,7 @@ static void show_upper_qual(List *qual, const char *qlabel, Plan *plan,
ExplainState *es);
static void show_sort_keys(Plan *sortplan, ExplainState *es);
static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_hash_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
static void ExplainMemberNodes(List *plans, PlanState **planstate,
@@ -1052,6 +1054,9 @@ ExplainNode(Plan *plan, PlanState *planstate,
"One-Time Filter", plan, es);
show_upper_qual(plan->qual, "Filter", plan, es);
break;
+ case T_Hash:
+ show_hash_info((HashState *) planstate, es);
+ break;
default:
break;
}
@@ -1405,6 +1410,37 @@ show_sort_info(SortState *sortstate, ExplainState *es)
}
/*
+ * Show information on hash buckets/batches.
+ */
+static void
+show_hash_info(HashState *hashstate, ExplainState *es)
+{
+ HashJoinTable hashtable;
+
+ Assert(IsA(hashstate, HashState));
+ hashtable = hashstate->hashtable;
+
+ if (es->verbose && hashtable)
+ {
+ long spacePeakKb = (hashtable->spacePeak + 1023) / 1024;
+ ExplainPropertyLong("Hash Buckets", hashtable->nbuckets, es);
+ ExplainPropertyLong("Hash Batches", hashtable->nbatch, es);
+ ExplainPropertyLong("Original Hash Batches",
+ hashtable->nbatch_original, es);
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str, "Peak Memory Usage: %ldkB\n",
+ spacePeakKb);
+ }
+ else
+ {
+ ExplainPropertyLong("Peak Memory Usage", spacePeakKb, es);
+ }
+ }
+}
+
+/*
* Fetch the name of an index in an EXPLAIN
*
* We allow plugins to get control here so that plans involving hypothetical
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index ef55b20..7ca387b 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -287,6 +287,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators)
hashtable->innerBatchFile = NULL;
hashtable->outerBatchFile = NULL;
hashtable->spaceUsed = 0;
+ hashtable->spacePeak = 0;
hashtable->spaceAllowed = work_mem * 1024L;
hashtable->spaceUsedSkew = 0;
hashtable->spaceAllowedSkew =
@@ -719,6 +720,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
hashTuple->next = hashtable->buckets[bucketno];
hashtable->buckets[bucketno] = hashTuple;
hashtable->spaceUsed += hashTupleSize;
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
if (hashtable->spaceUsed > hashtable->spaceAllowed)
ExecHashIncreaseNumBatches(hashtable);
}
@@ -1071,6 +1074,8 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
+ mcvsToUse * sizeof(int);
hashtable->spaceUsedSkew += nbuckets * sizeof(HashSkewBucket *)
+ mcvsToUse * sizeof(int);
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
/*
* Create a skew bucket for each MCV hash value.
@@ -1119,6 +1124,8 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
hashtable->nSkewBuckets++;
hashtable->spaceUsed += SKEW_BUCKET_OVERHEAD;
hashtable->spaceUsedSkew += SKEW_BUCKET_OVERHEAD;
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
}
free_attstatsslot(node->skewColType,
@@ -1205,6 +1212,8 @@ ExecHashSkewTableInsert(HashJoinTable hashtable,
/* Account for space used, and back off if we've used too much */
hashtable->spaceUsed += hashTupleSize;
hashtable->spaceUsedSkew += hashTupleSize;
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
while (hashtable->spaceUsedSkew > hashtable->spaceAllowedSkew)
ExecHashRemoveNextSkewBucket(hashtable);
diff --git a/src/include/executor/hashjoin.h b/src/include/executor/hashjoin.h
index 2ecea40..a1938dc 100644
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -149,6 +149,7 @@ typedef struct HashJoinTableData
Size spaceUsed; /* memory space currently used by tuples */
Size spaceAllowed; /* upper limit for space used */
+ Size spacePeak; /* peak space used */
Size spaceUsedSkew; /* skew hash table's current space usage */
Size spaceAllowedSkew; /* upper limit for skew hashtable */
On Sat, Jan 23, 2010 at 10:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I was also thinking about the possibility of adding a new option
called "output" and making that control whether the "Output" line gets
printed. It's kind of annoying to use EXPLAIN (ANALYZE, VERBOSE)
why not let it go in ANALYZE, just as the sort info
--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157
Le 24/01/2010 06:06, Jaime Casanova a écrit :
On Sat, Jan 23, 2010 at 10:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I was also thinking about the possibility of adding a new option
called "output" and making that control whether the "Output" line gets
printed. It's kind of annoying to use EXPLAIN (ANALYZE, VERBOSE)why not let it go in ANALYZE, just as the sort info
Yes, it would be more consistent. Other than that, this patch is quite
interesting.
--
Guillaume.
http://www.postgresqlfr.org
http://dalibo.com
isn't that line pretty much the main point of "verbose"? I would assume
anything new like this would get its own option which might default to on.
greg
On 24 Jan 2010 03:08, "Robert Haas" <robertmhaas@gmail.com> wrote:
Per recent discussion on pgsql-performance, and per discussion on
-hackers that it might not be too late for small patches after all,
here is a patch (as yet without documentation) which adds some
additional instrumentation to EXPLAIN for hashes: number of buckets,
number of batches, original number of batches, and peak memory
utilization. Thoughts?
I was also thinking about the possibility of adding a new option
called "output" and making that control whether the "Output" line gets
printed. It's kind of annoying to use EXPLAIN (ANALYZE, VERBOSE)
right now (and moreso with this patch) specifically because of that
line, which is quite... verbose. If we're going to change it ever we
should do it for 9.0, since we've made a lot of other changes that
people will be adjusting for anyhow.
Also, do we want to change the schema URL? The existing URL was
suggested by Peter but IIRC there was some thought that it might not
be the best choice.
http://www.postgresql.org/2009/explain
...Robert
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jan 24, 2010 at 12:06 AM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
On Sat, Jan 23, 2010 at 10:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I was also thinking about the possibility of adding a new option
called "output" and making that control whether the "Output" line gets
printed. It's kind of annoying to use EXPLAIN (ANALYZE, VERBOSE)why not let it go in ANALYZE, just as the sort info
It's kinda long-winded - it adds like 4 extra lines for each hash
join. I don't think I want to add that much clutter to regular E-A
output.
If people don't like making it controlled by VERBOSE, then I vote for
a new option, as Greg Stark suggested.
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jan 24, 2010 at 12:06 AM, Jaime Casanova
why not let it go in ANALYZE, just as the sort info
It's kinda long-winded - it adds like 4 extra lines for each hash
join. I don't think I want to add that much clutter to regular E-A
output.
Well, that would only happen if you're deliberately obtuse about the
formatting. The sort code manages to fit all the extra on one line,
and I don't see why hash couldn't.
I'd vote for just adding it in the exact same cases that sort adds extra
info. -1 for either adding a new option or changing the meaning of the
ones that are there.
regards, tom lane
On Sun, Jan 24, 2010 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jan 24, 2010 at 12:06 AM, Jaime Casanova
why not let it go in ANALYZE, just as the sort info
It's kinda long-winded - it adds like 4 extra lines for each hash
join. I don't think I want to add that much clutter to regular E-A
output.Well, that would only happen if you're deliberately obtuse about the
formatting. The sort code manages to fit all the extra on one line,
and I don't see why hash couldn't.I'd vote for just adding it in the exact same cases that sort adds extra
info. -1 for either adding a new option or changing the meaning of the
ones that are there.
Care to suggest a format?
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
Care to suggest a format?
As much like the sort case as possible ...
regards, tom lane
On Sun, Jan 24, 2010 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jan 24, 2010 at 12:06 AM, Jaime Casanova
why not let it go in ANALYZE, just as the sort info
It's kinda long-winded - it adds like 4 extra lines for each hash
join. I don't think I want to add that much clutter to regular E-A
output.Well, that would only happen if you're deliberately obtuse about the
formatting. The sort code manages to fit all the extra on one line,
and I don't see why hash couldn't.
OK, here's a new version.
...Robert
Attachments:
explain-hash-v2.patchtext/x-patch; charset=US-ASCII; name=explain-hash-v2.patchDownload
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -20,6 +20,7 @@
#include "commands/explain.h"
#include "commands/prepare.h"
#include "commands/trigger.h"
+#include "executor/hashjoin.h"
#include "executor/instrument.h"
#include "optimizer/clauses.h"
#include "optimizer/planner.h"
@@ -67,6 +68,7 @@ static void show_upper_qual(List *qual, const char *qlabel, Plan *plan,
ExplainState *es);
static void show_sort_keys(Plan *sortplan, ExplainState *es);
static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_hash_info(HashState *hashstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
static void ExplainMemberNodes(List *plans, PlanState **planstate,
@@ -1052,6 +1054,9 @@ ExplainNode(Plan *plan, PlanState *planstate,
"One-Time Filter", plan, es);
show_upper_qual(plan->qual, "Filter", plan, es);
break;
+ case T_Hash:
+ show_hash_info((HashState *) planstate, es);
+ break;
default:
break;
}
@@ -1405,6 +1410,47 @@ show_sort_info(SortState *sortstate, ExplainState *es)
}
/*
+ * Show information on hash buckets/batches.
+ */
+static void
+show_hash_info(HashState *hashstate, ExplainState *es)
+{
+ HashJoinTable hashtable;
+
+ Assert(IsA(hashstate, HashState));
+ hashtable = hashstate->hashtable;
+
+ if (hashtable)
+ {
+ long spacePeakKb = (hashtable->spacePeak + 1023) / 1024;
+ if (es->format != EXPLAIN_FORMAT_TEXT)
+ {
+ ExplainPropertyLong("Hash Buckets", hashtable->nbuckets, es);
+ ExplainPropertyLong("Hash Batches", hashtable->nbatch, es);
+ ExplainPropertyLong("Original Hash Batches",
+ hashtable->nbatch_original, es);
+ ExplainPropertyLong("Peak Memory Usage", spacePeakKb, es);
+ }
+ else if (hashtable->nbatch_original != hashtable->nbatch)
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str,
+ "Buckets: %d Batches: %d (originally %d) Memory Usage: %ldkB\n",
+ hashtable->nbuckets, hashtable->nbatch,
+ hashtable->nbatch_original, spacePeakKb);
+ }
+ else
+ {
+ appendStringInfoSpaces(es->str, es->indent * 2);
+ appendStringInfo(es->str,
+ "Buckets: %d Batches: %d Memory Usage: %ldkB\n",
+ hashtable->nbuckets, hashtable->nbatch,
+ spacePeakKb);
+ }
+ }
+}
+
+/*
* Fetch the name of an index in an EXPLAIN
*
* We allow plugins to get control here so that plans involving hypothetical
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -287,6 +287,7 @@ ExecHashTableCreate(Hash *node, List *hashOperators)
hashtable->innerBatchFile = NULL;
hashtable->outerBatchFile = NULL;
hashtable->spaceUsed = 0;
+ hashtable->spacePeak = 0;
hashtable->spaceAllowed = work_mem * 1024L;
hashtable->spaceUsedSkew = 0;
hashtable->spaceAllowedSkew =
@@ -719,6 +720,8 @@ ExecHashTableInsert(HashJoinTable hashtable,
hashTuple->next = hashtable->buckets[bucketno];
hashtable->buckets[bucketno] = hashTuple;
hashtable->spaceUsed += hashTupleSize;
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
if (hashtable->spaceUsed > hashtable->spaceAllowed)
ExecHashIncreaseNumBatches(hashtable);
}
@@ -1071,6 +1074,8 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
+ mcvsToUse * sizeof(int);
hashtable->spaceUsedSkew += nbuckets * sizeof(HashSkewBucket *)
+ mcvsToUse * sizeof(int);
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
/*
* Create a skew bucket for each MCV hash value.
@@ -1119,6 +1124,8 @@ ExecHashBuildSkewHash(HashJoinTable hashtable, Hash *node, int mcvsToUse)
hashtable->nSkewBuckets++;
hashtable->spaceUsed += SKEW_BUCKET_OVERHEAD;
hashtable->spaceUsedSkew += SKEW_BUCKET_OVERHEAD;
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
}
free_attstatsslot(node->skewColType,
@@ -1205,6 +1212,8 @@ ExecHashSkewTableInsert(HashJoinTable hashtable,
/* Account for space used, and back off if we've used too much */
hashtable->spaceUsed += hashTupleSize;
hashtable->spaceUsedSkew += hashTupleSize;
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
while (hashtable->spaceUsedSkew > hashtable->spaceAllowedSkew)
ExecHashRemoveNextSkewBucket(hashtable);
--- a/src/include/executor/hashjoin.h
+++ b/src/include/executor/hashjoin.h
@@ -149,6 +149,7 @@ typedef struct HashJoinTableData
Size spaceUsed; /* memory space currently used by tuples */
Size spaceAllowed; /* upper limit for space used */
+ Size spacePeak; /* peak space used */
Size spaceUsedSkew; /* skew hash table's current space usage */
Size spaceAllowedSkew; /* upper limit for skew hashtable */
On Thu, Jan 28, 2010 at 4:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jan 24, 2010 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jan 24, 2010 at 12:06 AM, Jaime Casanova
why not let it go in ANALYZE, just as the sort info
It's kinda long-winded - it adds like 4 extra lines for each hash
join. I don't think I want to add that much clutter to regular E-A
output.Well, that would only happen if you're deliberately obtuse about the
formatting. The sort code manages to fit all the extra on one line,
and I don't see why hash couldn't.OK, here's a new version.
OK, i have 3 hashes in a query and i got these 3 lines in an EXPLAIN
ANALYZE with your patch
"""
-> Hash (cost=3878.94..3878.94 rows=83594
width=34) (actual time=504.648..504.648 rows=83594 loops=1)
Buckets: 2048 Batches: 8 Memory Usage: 589kB
[...]
-> Hash (cost=14.49..14.49 rows=649 width=15)
(actual time=2.901..2.901 rows=649 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 32kB
[...]
-> Hash (cost=977.62..977.62 rows=6555 width=16)
(actual time=60.913..60.913 rows=6556 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 308kB
"""
I guess Memory Usage is per Batch, right? is this an average?
What is the unit measure for Bucket?
there are 14 temp files generated for this query and the only Sort
says it was on memory so these files should come from the hashes, can
we say that in the explain analyze? mmm... that memory usage, could be
Disk actually?
--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157
On Sat, Jan 30, 2010 at 12:26 PM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
On Thu, Jan 28, 2010 at 4:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jan 24, 2010 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jan 24, 2010 at 12:06 AM, Jaime Casanova
why not let it go in ANALYZE, just as the sort info
It's kinda long-winded - it adds like 4 extra lines for each hash
join. I don't think I want to add that much clutter to regular E-A
output.Well, that would only happen if you're deliberately obtuse about the
formatting. The sort code manages to fit all the extra on one line,
and I don't see why hash couldn't.OK, here's a new version.
OK, i have 3 hashes in a query and i got these 3 lines in an EXPLAIN
ANALYZE with your patch
"""
-> Hash (cost=3878.94..3878.94 rows=83594
width=34) (actual time=504.648..504.648 rows=83594 loops=1)
Buckets: 2048 Batches: 8 Memory Usage: 589kB
[...]
-> Hash (cost=14.49..14.49 rows=649 width=15)
(actual time=2.901..2.901 rows=649 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 32kB
[...]
-> Hash (cost=977.62..977.62 rows=6555 width=16)
(actual time=60.913..60.913 rows=6556 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 308kB
"""I guess Memory Usage is per Batch, right? is this an average?
It's the maximum memory every used by that hash.
What is the unit measure for Bucket?
There is no unit - it's just how many buckets.
there are 14 temp files generated for this query and the only Sort
says it was on memory so these files should come from the hashes, can
we say that in the explain analyze?
I think maybe it's 2 temp files per batch in excess of 1, thus (8 - 1)
x 2 = 14 for the 8-batch join and none for the other two. Perhaps
you'd care to test that hypothesis?
mmm... that memory usage, could be
Disk actually?
No.
...Robert