Inheritance planner CPU and memory usage change since 9.3.2
Hi
We saw a rather extreme performance problem in a cluster upgraded from
9.1 to 9.3. It uses a largish number of child tables (partitions) and
many columns. Planning a simple UPDATE via the base table started
using a very large amount of memory and CPU time.
My colleague Rushabh Lathia tracked the performance change down to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c03ad5602f529787968fa3201b35c119bbc6d782
.
The call to copyObject in the loop introduced here seems to be
problematic (copying append_rel_list for every element in
append_rel_list unconditionally), though we're still trying to figure
it out. Attached is a simple repro script, with variables to tweak.
Quite a few others have posted about this sort of thing and been
politely reminded of the 100 table caveat [1]/messages/by-id/8c9acaa.1f453.14c0da0402f.Coremail.chjischj@163.com[2]/messages/by-id/20141107185824.2513.53433@wrigleys.postgresql.org which is fair enough,
but the situations seems to have got dramatically worse for some users
after an upgrade.
[1]: /messages/by-id/8c9acaa.1f453.14c0da0402f.Coremail.chjischj@163.com
[2]: /messages/by-id/20141107185824.2513.53433@wrigleys.postgresql.org
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
On Wed, Jun 17, 2015 at 9:32 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
We saw a rather extreme performance problem in a cluster upgraded from
9.1 to 9.3. It uses a largish number of child tables (partitions) and
many columns. Planning a simple UPDATE via the base table started
using a very large amount of memory and CPU time.My colleague Rushabh Lathia tracked the performance change down to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c03ad5602f529787968fa3201b35c119bbc6d782
.The call to copyObject in the loop introduced here seems to be
problematic (copying append_rel_list for every element in
append_rel_list unconditionally), though we're still trying to figure
it out. Attached is a simple repro script, with variables to tweak.Quite a few others have posted about this sort of thing and been
politely reminded of the 100 table caveat [1][2] which is fair enough,
but the situations seems to have got dramatically worse for some users
after an upgrade.
Yes. The copyObject() call introduced by this commit seems to have
complexity O(T^2*C) where T is the number of child tables and C is the
number of columns per child. That's because the List that is being
copied is a list of AppendRelInfo nodes, each of which has a
translated_vars member that is a list of every Var in one table, and
we copy that list once per child.
It appears that in a lot of cases this work is unnecessary. The
second (long) for loop in inheritance_planner copies root->rowMarks
and root->append_rel_list so as to be able to apply ChangeVarNodes()
to the result, but we only actually do that if the rte is of type
RTE_SUBQUERY or if it has security quals. In the common case where we
reach inheritance_planner not because of UNION ALL but just because
the table has a bunch of inheritance children (none of which have RLS
policies applied), we copy everything and then modify none of it,
using up startling large amounts of memory in ways that pre-9.2
versions did not.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 17, 2015 at 9:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jun 17, 2015 at 9:32 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:We saw a rather extreme performance problem in a cluster upgraded from
9.1 to 9.3. It uses a largish number of child tables (partitions) and
many columns. Planning a simple UPDATE via the base table started
using a very large amount of memory and CPU time.My colleague Rushabh Lathia tracked the performance change down to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c03ad5602f529787968fa3201b35c119bbc6d782
.The call to copyObject in the loop introduced here seems to be
problematic (copying append_rel_list for every element in
append_rel_list unconditionally), though we're still trying to figure
it out. Attached is a simple repro script, with variables to tweak.Quite a few others have posted about this sort of thing and been
politely reminded of the 100 table caveat [1][2] which is fair enough,
but the situations seems to have got dramatically worse for some users
after an upgrade.Yes. The copyObject() call introduced by this commit seems to have
complexity O(T^2*C) where T is the number of child tables and C is the
number of columns per child. That's because the List that is being
copied is a list of AppendRelInfo nodes, each of which has a
translated_vars member that is a list of every Var in one table, and
we copy that list once per child.It appears that in a lot of cases this work is unnecessary. The
second (long) for loop in inheritance_planner copies root->rowMarks
and root->append_rel_list so as to be able to apply ChangeVarNodes()
to the result, but we only actually do that if the rte is of type
RTE_SUBQUERY or if it has security quals. In the common case where we
reach inheritance_planner not because of UNION ALL but just because
the table has a bunch of inheritance children (none of which have RLS
policies applied), we copy everything and then modify none of it,
using up startling large amounts of memory in ways that pre-9.2
versions did not.
The attached patch helps. It does two things:
1. It arranges for inheritance_planner to throw away the memory
consumed by the subroot's rowMarks and append_rel_list after the call
to grouping_planner for that subroot returns. This prevents the
explosive growth of memory usage in all cases I've tested so far, but
planning is still really slow.
2. It arranges not to deep-copy append_rel_list when the root's
append_rel_list doesn't need to be modified for the subroot. This
makes planning much much faster in simple cases, like a simple update
on a table with many partitions. But if you do attach a FROM clause
containing a subquery to such an update, then this optimization
doesn't kick in any more and things are still very slow (though still
memory-bounded, due to part 1).
I feel I might be missing a trick here. It seems unlikely to me that
we actually need the entire append_rel_list for every subquery; and we
almost certainly don't need to modify every element of the
append_rel_list for every subquery. Even the ones that no
ChangeVarNodes() call mutates still get deep-copied.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
mitigate-inheritance-planner-craziness-v1.patchtext/x-patch; charset=US-ASCII; name=mitigate-inheritance-planner-craziness-v1.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8afde2b..84c75f1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -43,6 +43,7 @@
#include "parser/parsetree.h"
#include "parser/parse_agg.h"
#include "rewrite/rewriteManip.h"
+#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/selfuncs.h"
@@ -845,9 +846,17 @@ inheritance_planner(PlannerInfo *root)
List *returningLists = NIL;
List *rowMarks;
ListCell *lc;
+ MemoryContext myctx;
+ MemoryContext oldctx;
Assert(parse->commandType != CMD_INSERT);
+ myctx = AllocSetContextCreate(CurrentMemoryContext,
+ "inheritance_planner",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
/*
* We generate a modified instance of the original Query for each target
* relation, plan that, and put all the plans into a list that will be
@@ -908,18 +917,16 @@ inheritance_planner(PlannerInfo *root)
/*
* The rowMarks list might contain references to subquery RTEs, so
- * make a copy that we can apply ChangeVarNodes to. (Fortunately, the
- * executor doesn't need to see the modified copies --- we can just
- * pass it the original rowMarks list.)
+ * make a copy that we can apply ChangeVarNodes to. If any security
+ * barrier quals are present, the rowMarks list may be further modified
+ * by grouping_planner. (Fortunately, the executor doesn't need to see
+ * the modified copies --- we can just pass it the original rowMarks
+ * list. For the same reason, we can arrange to throw away the copy
+ * we make here relatively quickly.)
*/
+ oldctx = MemoryContextSwitchTo(myctx);
subroot.rowMarks = (List *) copyObject(root->rowMarks);
-
- /*
- * The append_rel_list likewise might contain references to subquery
- * RTEs (if any subqueries were flattenable UNION ALLs). So prepare
- * to apply ChangeVarNodes to that, too.
- */
- subroot.append_rel_list = (List *) copyObject(root->append_rel_list);
+ MemoryContextSwitchTo(oldctx);
/*
* Add placeholders to the child Query's rangetable list to fill the
@@ -942,6 +949,7 @@ inheritance_planner(PlannerInfo *root)
if (final_rtable != NIL)
{
ListCell *lr;
+ bool did_copy = false;
rti = 1;
foreach(lr, parse->rtable)
@@ -960,6 +968,21 @@ inheritance_planner(PlannerInfo *root)
Index newrti;
/*
+ * The append_rel_list likewise might contain references
+ * to subquery RTEs (if any subqueries were flattenable
+ * UNION ALLs). So we need to mutate that too, and must
+ * therefore copy it.
+ */
+ if (!did_copy)
+ {
+ oldctx = MemoryContextSwitchTo(myctx);
+ subroot.append_rel_list =
+ (List *) copyObject(root->append_rel_list);
+ MemoryContextSwitchTo(oldctx);
+ did_copy = true;
+ }
+
+ /*
* The RTE can't contain any references to its own RT
* index, except in the security barrier quals, so we can
* save a few cycles by applying ChangeVarNodes before we
@@ -989,6 +1012,9 @@ inheritance_planner(PlannerInfo *root)
/* Generate plan */
subplan = grouping_planner(&subroot, 0.0 /* retrieve all tuples */ );
+ /* Reclaim some memory */
+ MemoryContextReset(myctx);
+
/*
* Planning may have modified the query result relation (if there were
* security barrier quals on the result RTE).
@@ -1097,6 +1123,8 @@ inheritance_planner(PlannerInfo *root)
Assert(!parse->onConflict);
}
+ MemoryContextDelete(myctx);
+
/* Mark result as unordered (probably unnecessary) */
root->query_pathkeys = NIL;
On 18 June 2015 at 14:48, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jun 17, 2015 at 9:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jun 17, 2015 at 9:32 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:We saw a rather extreme performance problem in a cluster upgraded from
9.1 to 9.3. It uses a largish number of child tables (partitions) and
many columns. Planning a simple UPDATE via the base table started
using a very large amount of memory and CPU time.My colleague Rushabh Lathia tracked the performance change down to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c03ad5602f529787968fa3201b35c119bbc6d782
.The call to copyObject in the loop introduced here seems to be
problematic (copying append_rel_list for every element in
append_rel_list unconditionally), though we're still trying to figure
it out. Attached is a simple repro script, with variables to tweak.Quite a few others have posted about this sort of thing and been
politely reminded of the 100 table caveat [1][2] which is fair enough,
but the situations seems to have got dramatically worse for some users
after an upgrade.Yes. The copyObject() call introduced by this commit seems to have
complexity O(T^2*C) where T is the number of child tables and C is the
number of columns per child. That's because the List that is being
copied is a list of AppendRelInfo nodes, each of which has a
translated_vars member that is a list of every Var in one table, and
we copy that list once per child.It appears that in a lot of cases this work is unnecessary. The
second (long) for loop in inheritance_planner copies root->rowMarks
and root->append_rel_list so as to be able to apply ChangeVarNodes()
to the result, but we only actually do that if the rte is of type
RTE_SUBQUERY or if it has security quals. In the common case where we
reach inheritance_planner not because of UNION ALL but just because
the table has a bunch of inheritance children (none of which have RLS
policies applied), we copy everything and then modify none of it,
using up startling large amounts of memory in ways that pre-9.2
versions did not.The attached patch helps. It does two things:
1. It arranges for inheritance_planner to throw away the memory
consumed by the subroot's rowMarks and append_rel_list after the call
to grouping_planner for that subroot returns. This prevents the
explosive growth of memory usage in all cases I've tested so far, but
planning is still really slow.2. It arranges not to deep-copy append_rel_list when the root's
append_rel_list doesn't need to be modified for the subroot. This
makes planning much much faster in simple cases, like a simple update
on a table with many partitions. But if you do attach a FROM clause
containing a subquery to such an update, then this optimization
doesn't kick in any more and things are still very slow (though still
memory-bounded, due to part 1).I feel I might be missing a trick here. It seems unlikely to me that
we actually need the entire append_rel_list for every subquery; and we
almost certainly don't need to modify every element of the
append_rel_list for every subquery. Even the ones that no
ChangeVarNodes() call mutates still get deep-copied.
Yeah, you could probably pre-compute the indexes of the RTEs that need
to copied, outside of the big loop, and store them in a bitmapset.
Then, instead of copying the entire list of rowmarks/append_rel_infos
each time, you could just copy the ones that referred to those RTE
indexes (and only if the bitmapset was non-empty, which is the
equivalent of your second optimisation). However, for AppendRelInfos,
ChangeVarNodes() descends into the Vars in the translated_vars list,
so short-cutting the copying of the AppendRelInfo isn't obviously
safe. But, looking more closely, does ChangeVarNodes actually need to
examine translated_vars (the fall-through case) when child_relid isn't
the old rt_index? If not, that could be a big saving in cases like
this.
Regards,
Dean
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
On 18 June 2015 at 14:48, Robert Haas <robertmhaas@gmail.com> wrote:
I feel I might be missing a trick here. It seems unlikely to me that
we actually need the entire append_rel_list for every subquery; and we
almost certainly don't need to modify every element of the
append_rel_list for every subquery. Even the ones that no
ChangeVarNodes() call mutates still get deep-copied.
Yeah, you could probably pre-compute the indexes of the RTEs that need
to copied, outside of the big loop, and store them in a bitmapset.
Then, instead of copying the entire list of rowmarks/append_rel_infos
each time, you could just copy the ones that referred to those RTE
indexes (and only if the bitmapset was non-empty, which is the
equivalent of your second optimisation). However, for AppendRelInfos,
ChangeVarNodes() descends into the Vars in the translated_vars list,
so short-cutting the copying of the AppendRelInfo isn't obviously
safe. But, looking more closely, does ChangeVarNodes actually need to
examine translated_vars (the fall-through case) when child_relid isn't
the old rt_index? If not, that could be a big saving in cases like
this.
I'm a bit surprised that duplicating the append_rel_list is a noticeable
performance problem. It ought to be far smaller than the Query tree that
we've always duplicated in this loop --- in particular, it's really a
subset of what we have in the RTE list, no?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 18, 2015 at 3:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
On 18 June 2015 at 14:48, Robert Haas <robertmhaas@gmail.com> wrote:
I feel I might be missing a trick here. It seems unlikely to me that
we actually need the entire append_rel_list for every subquery; and we
almost certainly don't need to modify every element of the
append_rel_list for every subquery. Even the ones that no
ChangeVarNodes() call mutates still get deep-copied.Yeah, you could probably pre-compute the indexes of the RTEs that need
to copied, outside of the big loop, and store them in a bitmapset.
Then, instead of copying the entire list of rowmarks/append_rel_infos
each time, you could just copy the ones that referred to those RTE
indexes (and only if the bitmapset was non-empty, which is the
equivalent of your second optimisation). However, for AppendRelInfos,
ChangeVarNodes() descends into the Vars in the translated_vars list,
so short-cutting the copying of the AppendRelInfo isn't obviously
safe. But, looking more closely, does ChangeVarNodes actually need to
examine translated_vars (the fall-through case) when child_relid isn't
the old rt_index? If not, that could be a big saving in cases like
this.I'm a bit surprised that duplicating the append_rel_list is a noticeable
performance problem. It ought to be far smaller than the Query tree that
we've always duplicated in this loop --- in particular, it's really a
subset of what we have in the RTE list, no?
Well, append_rel_list has an AppendRelInfo for every RTE and that
contains a List (translated_vars) which in turn contains a Var node
for every column. I'm not sure how that compares to the RTE itself.
I think it's the cost of copying the translated_vars list that is
really the problem here - you can have 200 or 500 columns in a table,
so that's a Var and a ListCell for each one. In the problem cases,
the number of AppendRelInfo elements is a small percentage of the
number of Var nodes under them.
That having been said, I don't know how the size of all that compares
to the size of the Query. But I think the Query must be smaller,
because arranging to discard the AppendRelInfo and its translated_vars
list, and the per-subroot rowMarks list, after every call to
grouping_planner stops the memory blowup.
The whole translated_vars representation seems needless inefficient.
For a subquery, you probably need something like that. But for an
inheritance child, you just need to swap the varno and maybe remap
some varattnos. Indexing into a C array to grab the varattno seems
like it would be a heck of a lot more efficient than calling
list_nth(). It might be worth making AppendRelInfo support a choice
of representations so that we can use something more optimized for the
simple case while not losing the full generality when we need it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Jun 18, 2015 at 3:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm a bit surprised that duplicating the append_rel_list is a noticeable
performance problem. It ought to be far smaller than the Query tree that
we've always duplicated in this loop --- in particular, it's really a
subset of what we have in the RTE list, no?
Well, append_rel_list has an AppendRelInfo for every RTE and that
contains a List (translated_vars) which in turn contains a Var node
for every column. I'm not sure how that compares to the RTE itself.
RTEs also have a per-column component, namely the lists of column alias
names. So there's something odd going on here. I'll dig into it when
I get a chance (possibly not during PGCon).
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Well, append_rel_list has an AppendRelInfo for every RTE and that
contains a List (translated_vars) which in turn contains a Var node
for every column. I'm not sure how that compares to the RTE itself.
RTEs also have a per-column component, namely the lists of column alias
names.
... although I see that range_table_mutator doesn't bother to copy/change
the column alias substructure. (Wonder if that gives rise to any
observable EXPLAIN bugs...) But it still seems like the append_rel_list
shouldn't be all that much bulkier than all the other crap that gets
generated inside this loop. We're not doing anything at all to reclaim
space consumed inside subquery_planner, and you'd think that would be
a lot.
By the by, the tablesample additions to range_table_mutator are obviously
broken.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 18, 2015 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... although I see that range_table_mutator doesn't bother to copy/change
the column alias substructure. (Wonder if that gives rise to any
observable EXPLAIN bugs...) But it still seems like the append_rel_list
shouldn't be all that much bulkier than all the other crap that gets
generated inside this loop. We're not doing anything at all to reclaim
space consumed inside subquery_planner, and you'd think that would be
a lot.By the by, the tablesample additions to range_table_mutator are obviously
broken.
Whee.
Meanwhile, here is an updated patch. The attached script (a modified
version of something Thomas Munro sent me privately) contains a bunch
of test queries. With the original patch I sent earlier, here are the
timings I got:
Q1 Time: 16215.887 ms
Q2 Time: 18674.139 ms
Q3 Time: 1029.093 ms
Q4 Time: 86497.781 ms
Q5 Time: 1143.851 ms
This version is about the same for the last three, but the first two
get much faster:
Q1 Time: 2951.231 ms
Q2 Time: 1251.809 ms
Q3 Time: 1049.235 ms
Q4 Time: 88477.803 ms
Q5 Time: 1172.965 ms
The speedup comes from the following trick: the first time we hit a
query that might requite a ChangeVarNodes() on the append_rel_list, we
compute a bitmapset of varnos that appear in that list. Then, every
time we're thinking about doing a ChangeVarNodes from rti to new_rti,
we check whether rti appears in the Bitmapset. If not, we can skip
ChangeVarNodes(). That seems to reduce the amount of object-copying
and object-walking attributable to this loop to something negligible
in all of these test cases.
The extraordinarily planning time for query 4 is caused by a
completely different problem: SearchCatCache eats up huge amounts of
CPU; its callers are get_attavgwidth and get_typlen. It's not clear
to me why doubling the number of relations causes such an enormous
explosion in calls to those functions; I would expect the number of
calls to double, but presumably the actual increase is much more.
That's a separate problem, though, unconnected to
c03ad5602f529787968fa3201b35c119bbc6d782 and not necessarily a
regression.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
mitigate-inheritance-planner-craziness-v2.patchtext/x-patch; charset=US-ASCII; name=mitigate-inheritance-planner-craziness-v2.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8afde2b..6737818 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -43,6 +43,7 @@
#include "parser/parsetree.h"
#include "parser/parse_agg.h"
#include "rewrite/rewriteManip.h"
+#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/selfuncs.h"
@@ -845,9 +846,19 @@ inheritance_planner(PlannerInfo *root)
List *returningLists = NIL;
List *rowMarks;
ListCell *lc;
+ MemoryContext myctx;
+ MemoryContext oldctx;
+ bool did_extract = false;
+ Relids append_rel_list_relids = NULL;
Assert(parse->commandType != CMD_INSERT);
+ myctx = AllocSetContextCreate(CurrentMemoryContext,
+ "inheritance_planner",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
/*
* We generate a modified instance of the original Query for each target
* relation, plan that, and put all the plans into a list that will be
@@ -908,18 +919,16 @@ inheritance_planner(PlannerInfo *root)
/*
* The rowMarks list might contain references to subquery RTEs, so
- * make a copy that we can apply ChangeVarNodes to. (Fortunately, the
- * executor doesn't need to see the modified copies --- we can just
- * pass it the original rowMarks list.)
+ * make a copy that we can apply ChangeVarNodes to. If any security
+ * barrier quals are present, the rowMarks list may be further modified
+ * by grouping_planner. (Fortunately, the executor doesn't need to see
+ * the modified copies --- we can just pass it the original rowMarks
+ * list. For the same reason, we can arrange to throw away the copy
+ * we make here relatively quickly.)
*/
+ oldctx = MemoryContextSwitchTo(myctx);
subroot.rowMarks = (List *) copyObject(root->rowMarks);
-
- /*
- * The append_rel_list likewise might contain references to subquery
- * RTEs (if any subqueries were flattenable UNION ALLs). So prepare
- * to apply ChangeVarNodes to that, too.
- */
- subroot.append_rel_list = (List *) copyObject(root->append_rel_list);
+ MemoryContextSwitchTo(oldctx);
/*
* Add placeholders to the child Query's rangetable list to fill the
@@ -942,6 +951,7 @@ inheritance_planner(PlannerInfo *root)
if (final_rtable != NIL)
{
ListCell *lr;
+ bool did_copy = false;
rti = 1;
foreach(lr, parse->rtable)
@@ -968,7 +978,39 @@ inheritance_planner(PlannerInfo *root)
newrti = list_length(subroot.parse->rtable) + 1;
ChangeVarNodes((Node *) subroot.parse, rti, newrti, 0);
ChangeVarNodes((Node *) subroot.rowMarks, rti, newrti, 0);
- ChangeVarNodes((Node *) subroot.append_rel_list, rti, newrti, 0);
+
+ /*
+ * The append_rel_list likewise might contain references
+ * to subquery RTEs (if any subqueries were flattenable
+ * UNION ALLs). We can't modify the original copy owned
+ * by the parent root, so we have to copy the list if
+ * anything needs to be changed. But that may be expensive
+ * if there are many RTEs, so if we have not yet made a
+ * copy, do so only if ChangeVarNodes will not be a no-op.
+ * Even walking the whole node tree is expensive, so skip
+ * that unless a reference to this RTI is indeed present.
+ */
+ if (!did_extract)
+ {
+ append_rel_list_relids =
+ ExtractRTIndexes((Node *) subroot.append_rel_list,
+ 0);
+ did_extract = true;
+ }
+ if (bms_is_member(rti, append_rel_list_relids))
+ {
+ if (!did_copy)
+ {
+ oldctx = MemoryContextSwitchTo(myctx);
+ subroot.append_rel_list =
+ (List *) copyObject(root->append_rel_list);
+ MemoryContextSwitchTo(oldctx);
+ did_copy = true;
+ }
+ ChangeVarNodes((Node *) subroot.append_rel_list, rti,
+ newrti, 0);
+ }
+
rte = copyObject(rte);
ChangeVarNodes((Node *) rte->securityQuals, rti, newrti, 0);
subroot.parse->rtable = lappend(subroot.parse->rtable,
@@ -976,6 +1018,8 @@ inheritance_planner(PlannerInfo *root)
}
rti++;
}
+
+ bms_free(append_rel_list_relids);
}
/* There shouldn't be any OJ or LATERAL info to translate, as yet */
@@ -989,6 +1033,9 @@ inheritance_planner(PlannerInfo *root)
/* Generate plan */
subplan = grouping_planner(&subroot, 0.0 /* retrieve all tuples */ );
+ /* Reclaim some memory */
+ MemoryContextReset(myctx);
+
/*
* Planning may have modified the query result relation (if there were
* security barrier quals on the result RTE).
@@ -1097,6 +1144,8 @@ inheritance_planner(PlannerInfo *root)
Assert(!parse->onConflict);
}
+ MemoryContextDelete(myctx);
+
/* Mark result as unordered (probably unnecessary) */
root->query_pathkeys = NIL;
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 1da90ff..3075b9a 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -674,6 +674,152 @@ adjust_relid_set(Relids relids, int oldrelid, int newrelid)
}
/*
+ * ExtractRTIndexes - find all RT indexes referenced in a node tree
+ *
+ * This is similar to pull_varnos, but it needs to use exactly the same
+ * criteria as ChangeVarNodes, and pull_varnos does not.
+ */
+
+typedef struct
+{
+ int sublevels_up;
+ Relids varnos;
+} ExtractRTIndexes_context;
+
+static bool
+ExtractRTIndexes_walker(Node *node, ExtractRTIndexes_context *context)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varlevelsup == context->sublevels_up)
+ context->varnos = bms_add_member(context->varnos, var->varno);
+ return false;
+ }
+ if (IsA(node, CurrentOfExpr))
+ {
+ CurrentOfExpr *cexpr = (CurrentOfExpr *) node;
+
+ if (context->sublevels_up == 0)
+ context->varnos = bms_add_member(context->varnos, cexpr->cvarno);
+ return false;
+ }
+ if (IsA(node, RangeTblRef))
+ {
+ RangeTblRef *rtr = (RangeTblRef *) node;
+
+ if (context->sublevels_up == 0)
+ context->varnos = bms_add_member(context->varnos, rtr->rtindex);
+ /* the subquery itself is visited separately */
+ return false;
+ }
+ if (IsA(node, JoinExpr))
+ {
+ JoinExpr *j = (JoinExpr *) node;
+
+ if (context->sublevels_up == 0)
+ context->varnos = bms_add_member(context->varnos, j->rtindex);
+ /* fall through to examine children */
+ }
+ if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ if (phv->phlevelsup == context->sublevels_up)
+ context->varnos = bms_add_members(context->varnos, phv->phrels);
+ /* fall through to examine children */
+ }
+ if (IsA(node, PlanRowMark))
+ {
+ PlanRowMark *rowmark = (PlanRowMark *) node;
+
+ if (context->sublevels_up == 0)
+ {
+ context->varnos = bms_add_member(context->varnos, rowmark->rti);
+ context->varnos = bms_add_member(context->varnos, rowmark->prti);
+ }
+ return false;
+ }
+ if (IsA(node, AppendRelInfo))
+ {
+ AppendRelInfo *appinfo = (AppendRelInfo *) node;
+
+ if (context->sublevels_up == 0)
+ {
+ context->varnos = bms_add_member(context->varnos,
+ appinfo->parent_relid);
+ context->varnos = bms_add_member(context->varnos,
+ appinfo->child_relid);
+ }
+ /* fall through to examine children */
+ }
+ /* Shouldn't need to handle other planner auxiliary nodes here */
+ Assert(!IsA(node, SpecialJoinInfo));
+ Assert(!IsA(node, LateralJoinInfo));
+ Assert(!IsA(node, PlaceHolderInfo));
+ Assert(!IsA(node, MinMaxAggInfo));
+
+ if (IsA(node, Query))
+ {
+ /* Recurse into subselects */
+ bool result;
+
+ context->sublevels_up++;
+ result = query_tree_walker((Query *) node, ExtractRTIndexes_walker,
+ (void *) context, 0);
+ context->sublevels_up--;
+ return result;
+ }
+ return expression_tree_walker(node, ExtractRTIndexes_walker,
+ (void *) context);
+}
+
+Relids
+ExtractRTIndexes(Node *node, int sublevels_up)
+{
+ ExtractRTIndexes_context context;
+
+ context.sublevels_up = sublevels_up;
+ context.varnos = NULL;
+
+ /*
+ * Must be prepared to start with a Query or a bare expression tree; if
+ * it's a Query, go straight to query_tree_walker to make sure that
+ * sublevels_up doesn't get incremented prematurely.
+ */
+ if (node && IsA(node, Query))
+ {
+ Query *qry = (Query *) node;
+
+ if (sublevels_up == 0)
+ {
+ ListCell *l;
+
+ context.varnos = bms_add_member(context.varnos,
+ qry->resultRelation);
+ if (qry->onConflict)
+ context.varnos = bms_add_member(context.varnos,
+ qry->onConflict->exclRelIndex);
+
+ foreach(l, qry->rowMarks)
+ {
+ RowMarkClause *rc = (RowMarkClause *) lfirst(l);
+
+ context.varnos = bms_add_member(context.varnos, rc->rti);
+ }
+ }
+ query_tree_walker(qry, ExtractRTIndexes_walker, (void *) &context, 0);
+ }
+ else
+ ExtractRTIndexes_walker(node, &context);
+
+ return context.varnos;
+}
+
+/*
* IncrementVarSublevelsUp - adjust Var nodes when pushing them down in tree
*
* Find all Var nodes in the given tree having varlevelsup >= min_sublevels_up,
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index bb56fa0..1fc1587 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/relation.h"
typedef struct replace_rte_variables_context replace_rte_variables_context;
@@ -42,6 +43,7 @@ typedef enum ReplaceVarsNoMatchOption
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int old_varno, int new_varno,
int sublevels_up);
+extern Relids ExtractRTIndexes(Node *node, int sublevels_up);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
On 2015-06-18 22:04, Tom Lane wrote:
By the by, the tablesample additions to range_table_mutator are obviously
broken.
Bah, typos. Attached patch corrects them.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
fix-tablesample-mutator-typo.difftext/x-patch; name=fix-tablesample-mutator-typo.diffDownload
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a2bcca5..931d464 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2870,9 +2870,9 @@ range_table_mutator(List *rtable,
case RTE_RELATION:
if (rte->tablesample)
{
- MUTATE(rte->tablesample->args, rte->tablesample->args,
+ MUTATE(newrte->tablesample->args, rte->tablesample->args,
List *);
- MUTATE(rte->tablesample->repeatable,
+ MUTATE(newrte->tablesample->repeatable,
rte->tablesample->repeatable, Node *);
}
break;
On 2015-06-19 00:38, Petr Jelinek wrote:
On 2015-06-18 22:04, Tom Lane wrote:
By the by, the tablesample additions to range_table_mutator are obviously
broken.Bah, typos. Attached patch corrects them.
Actually it should probably look more like this, sorry.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
fix-tablesample-mutator-typo2.difftext/x-patch; name=fix-tablesample-mutator-typo2.diffDownload
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a2bcca5..1c89abb 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2870,9 +2870,10 @@ range_table_mutator(List *rtable,
case RTE_RELATION:
if (rte->tablesample)
{
- MUTATE(rte->tablesample->args, rte->tablesample->args,
+ FLATCOPY(newrte->tablesample, rte->tablesample, TableSampleClause);
+ MUTATE(newrte->tablesample->args, rte->tablesample->args,
List *);
- MUTATE(rte->tablesample->repeatable,
+ MUTATE(newrte->tablesample->repeatable,
rte->tablesample->repeatable, Node *);
}
break;
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
index 7b3eb9b..2be5d85 100644
--- a/src/test/regress/sql/tablesample.sql
+++ b/src/test/regress/sql/tablesample.sql
@@ -1,4 +1,5 @@
CREATE TABLE test_tablesample (id int, name text) WITH (fillfactor=10); -- force smaller pages so we don't have to load too much data to get multiple pages
+CREATE TABLE test_tablesample_inh() INHERITS (test_tablesample);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
On 2015-06-19 01:04, Petr Jelinek wrote:
On 2015-06-19 00:38, Petr Jelinek wrote:
On 2015-06-18 22:04, Tom Lane wrote:
By the by, the tablesample additions to range_table_mutator are
obviously
broken.Bah, typos. Attached patch corrects them.
Actually it should probably look more like this, sorry.
Apparently it's not a good idea to do this at 1AM after long day :/
The previous diff included some garbage in tests from my local
experimentations.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
fix-tablesample-mutator-typo3.difftext/x-patch; name=fix-tablesample-mutator-typo3.diffDownload
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a2bcca5..1c89abb 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2870,9 +2870,10 @@ range_table_mutator(List *rtable,
case RTE_RELATION:
if (rte->tablesample)
{
- MUTATE(rte->tablesample->args, rte->tablesample->args,
+ FLATCOPY(newrte->tablesample, rte->tablesample, TableSampleClause);
+ MUTATE(newrte->tablesample->args, rte->tablesample->args,
List *);
- MUTATE(rte->tablesample->repeatable,
+ MUTATE(newrte->tablesample->repeatable,
rte->tablesample->repeatable, Node *);
}
break;
Petr Jelinek <petr@2ndquadrant.com> writes:
On 2015-06-19 01:04, Petr Jelinek wrote:
On 2015-06-19 00:38, Petr Jelinek wrote:
On 2015-06-18 22:04, Tom Lane wrote:
By the by, the tablesample additions to range_table_mutator are
obviously broken.
Apparently it's not a good idea to do this at 1AM after long day :/
The previous diff included some garbage in tests from my local
experimentations.
Pushed with minor adjustments.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 19, 2015 at 9:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
The extraordinarily planning time for query 4 is caused by a
completely different problem: SearchCatCache eats up huge amounts of
CPU; its callers are get_attavgwidth and get_typlen. It's not clear
to me why doubling the number of relations causes such an enormous
explosion in calls to those functions; I would expect the number of
calls to double, but presumably the actual increase is much more.
That's a separate problem, though, unconnected to
c03ad5602f529787968fa3201b35c119bbc6d782 and not necessarily a
regression.
I don't have a great high level understanding of the planner, and Q4
may be somehow asking for trouble or unrepresentative of anything
useful, but I did some profiling and instrumenting, and I noticed that
we spend tables^2 * columns time in get_attavgwidth. I wonder if
estimate_rel_size (or some other function in that stack, or some new
function wrapper) should remember the result for each relation for the
scope of this planner invocation. That should bring the calls to
get_attavgwidth down to the same order as Q3 (tables * columns).
Here is some profiler output from a 500 table, 500 column Q4 run:
160295.0ms 60.2% 95 inheritance_planner
120064.0ms 45.1% 0 grouping_planner
119826.0ms 45.0% 2 query_planner
114204.0ms 42.9% 0 add_base_rels_to_query
114204.0ms 42.9% 0 add_base_rels_to_query
114204.0ms 42.9% 151 build_simple_rel
113817.0ms 42.8% 57 build_simple_rel
113600.0ms 42.7% 19 get_relation_info
112123.0ms 42.1% 27 estimate_rel_size
111557.0ms 41.9% 14139 get_rel_data_width
80152.0ms 30.1% 362 get_attavgwidth
79788.0ms 30.0% 282 SearchSysCache
79368.0ms 29.8% 52373 SearchCatCache
13182.0ms 4.9% 2125
CatalogCacheComputeHashValue
Here are some tables showing function call counts. The columns are
ordered like this:
1: Query number
2: Number of child tables
3: Number of columns
4: Number of calls to add_base_rels_to_query
5: Number of calls to build_simple_rel
6: Number of calls to get_relation_info
7: Number of calls to estimate_rel_size
8: Number of calls to get_attavgwidth
Q3 10 10 22 11 11 11 131
Q3 10 20 22 11 11 11 241
Q3 10 30 22 11 11 11 351
Q3 20 10 42 21 21 21 251
Q3 20 20 42 21 21 21 461
Q3 20 30 42 21 21 21 671
Q3 30 10 62 31 31 31 371
Q3 30 20 62 31 31 31 681
Q3 30 30 62 31 31 31 991
Q3 500 500 1002 501 501 501 251501
Q4 10 10 33 143 143 132 1451
Q4 10 20 33 143 143 132 2661
Q4 10 30 33 143 143 132 3871
Q4 20 10 63 483 483 462 5291
Q4 20 20 63 483 483 462 9701
Q4 20 30 63 483 483 462 14111
Q4 30 10 93 1023 1023 992 11531
Q4 30 20 93 1023 1023 992 21141
Q4 30 30 93 1023 1023 992 30751
Q4 500 500 1503 252003 252003 251502 126002501
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
Meanwhile, here is an updated patch.
I don't care for that patch too much: it seems a bit brute-force, and I'm
quite worried by the assumption that it's okay to destroy each child's
append_rel_list after processing the child. That would fail if any of
the Vars/subexpressions in the lists get incorporated into the resulting
child Plan, which does not seem impossible. (I think in many cases we'd
do a copyObject() when extracting an append_rel_list expression, but this
hardly seems guaranteed.)
I propose instead the attached patch, which operates by identifying which
of the append_rel_list entries actually contain subquery references, and
copying only those; the other ones are just linked into the child's
append_rel_list by reference, which is okay because they won't get
modified.
On my laptop, I get the following timings for your test queries from
unmodified HEAD (--cassert build):
# Q1: 41260.239 ms
# Q2: 45225.768 ms
# Q3: 43066.958 ms
# Q4: 193360.726 ms
# Q5: 40746.503 ms
and these with my patch:
# Q1: 1767.753 ms
# Q2: 3662.131 ms
# Q3: 814.293 ms
# Q4: 64468.914 ms
# Q5: 881.295 ms
which seems to be generally a better result.
The extraordinarily planning time for query 4 is caused by a
completely different problem: SearchCatCache eats up huge amounts of
CPU; its callers are get_attavgwidth and get_typlen. It's not clear
to me why doubling the number of relations causes such an enormous
explosion in calls to those functions; I would expect the number of
calls to double, but presumably the actual increase is much more.
Actually, Q4 necessarily involves O(N^2) planning time, because for
each of N target relations you're considering a join to an N-member
inheritance tree. A lot of those ultimately get thrown away by
constraint exclusion, but not before we've expended significant
cycles on them. I do not think we are going to get much traction
on that --- even if we do something to knock off whatever the leading
term is, there'll still be more O(N^2) behavior right behind it.
regards, tom lane
Attachments:
reduce-inheritance-planner-copying-v3.patchtext/x-patch; charset=us-ascii; name=reduce-inheritance-planner-copying-v3.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8afde2b7d5069707e346901f819bed888a2333ee..d7fee96ba01272efdf7231d5985ab688912bcf58 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** inheritance_planner(PlannerInfo *root)
*** 834,840 ****
{
Query *parse = root->parse;
int parentRTindex = parse->resultRelation;
! Bitmapset *resultRTindexes = NULL;
int nominalRelation = -1;
List *final_rtable = NIL;
int save_rel_array_size = 0;
--- 834,842 ----
{
Query *parse = root->parse;
int parentRTindex = parse->resultRelation;
! Bitmapset *resultRTindexes;
! Bitmapset *subqueryRTindexes;
! Bitmapset *modifiableARIindexes;
int nominalRelation = -1;
List *final_rtable = NIL;
int save_rel_array_size = 0;
*************** inheritance_planner(PlannerInfo *root)
*** 845,850 ****
--- 847,853 ----
List *returningLists = NIL;
List *rowMarks;
ListCell *lc;
+ Index rti;
Assert(parse->commandType != CMD_INSERT);
*************** inheritance_planner(PlannerInfo *root)
*** 867,874 ****
* subqueries during planning, and so we must create copies of them too,
* except where they are target relations, which will each only be used in
* a single plan.
*/
! resultRTindexes = bms_add_member(resultRTindexes, parentRTindex);
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
--- 870,879 ----
* subqueries during planning, and so we must create copies of them too,
* except where they are target relations, which will each only be used in
* a single plan.
+ *
+ * To begin with, we'll need a bitmapset of the target relation relids.
*/
! resultRTindexes = bms_make_singleton(parentRTindex);
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
*************** inheritance_planner(PlannerInfo *root)
*** 878,889 ****
appinfo->child_relid);
}
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
PlannerInfo subroot;
Plan *subplan;
! Index rti;
/* append_rel_list contains all append rels; ignore others */
if (appinfo->parent_relid != parentRTindex)
--- 883,937 ----
appinfo->child_relid);
}
+ /*
+ * Now, generate a bitmapset of the relids of the subquery RTEs, including
+ * security-barrier RTEs that will become subqueries, as just explained.
+ */
+ subqueryRTindexes = NULL;
+ rti = 1;
+ foreach(lc, parse->rtable)
+ {
+ RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+
+ if (rte->rtekind == RTE_SUBQUERY ||
+ (rte->securityQuals != NIL &&
+ !bms_is_member(rti, resultRTindexes)))
+ subqueryRTindexes = bms_add_member(subqueryRTindexes, rti);
+ rti++;
+ }
+
+ /*
+ * Next, we want to identify which AppendRelInfo items contain references
+ * to any of the aforesaid subquery RTEs. These items will need to be
+ * copied and modified to adjust their subquery references; whereas the
+ * other ones need not be touched. It's worth being tense over this
+ * because we can usually avoid processing most of the AppendRelInfo
+ * items, thereby saving O(N^2) space and time when the target is a large
+ * inheritance tree. We can identify AppendRelInfo items by their
+ * child_relid, since that should be unique within the list.
+ */
+ modifiableARIindexes = NULL;
+ foreach(lc, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
+
+ if (bms_is_member(appinfo->parent_relid, subqueryRTindexes) ||
+ bms_is_member(appinfo->child_relid, subqueryRTindexes) ||
+ bms_overlap(pull_varnos((Node *) appinfo->translated_vars),
+ subqueryRTindexes))
+ modifiableARIindexes = bms_add_member(modifiableARIindexes,
+ appinfo->child_relid);
+ }
+
+ /*
+ * And now we can get on with generating a plan for each child table.
+ */
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
PlannerInfo subroot;
Plan *subplan;
! ListCell *lc2;
/* append_rel_list contains all append rels; ignore others */
if (appinfo->parent_relid != parentRTindex)
*************** inheritance_planner(PlannerInfo *root)
*** 917,925 ****
/*
* The append_rel_list likewise might contain references to subquery
* RTEs (if any subqueries were flattenable UNION ALLs). So prepare
! * to apply ChangeVarNodes to that, too.
*/
! subroot.append_rel_list = (List *) copyObject(root->append_rel_list);
/*
* Add placeholders to the child Query's rangetable list to fill the
--- 965,985 ----
/*
* The append_rel_list likewise might contain references to subquery
* RTEs (if any subqueries were flattenable UNION ALLs). So prepare
! * to apply ChangeVarNodes to that, too. As explained above, we only
! * want to copy items that actually contain such references; the
! * rest can just get linked into the subroot's append_rel_list.
*/
! subroot.append_rel_list = NIL;
! foreach(lc2, root->append_rel_list)
! {
! AppendRelInfo *appinfo2 = (AppendRelInfo *) lfirst(lc2);
!
! if (bms_is_member(appinfo2->child_relid, modifiableARIindexes))
! appinfo2 = (AppendRelInfo *) copyObject(appinfo2);
!
! subroot.append_rel_list = lappend(subroot.append_rel_list,
! appinfo2);
! }
/*
* Add placeholders to the child Query's rangetable list to fill the
*************** inheritance_planner(PlannerInfo *root)
*** 933,945 ****
/*
* If this isn't the first child Query, generate duplicates of all
! * subquery RTEs, and adjust Var numbering to reference the
! * duplicates. To simplify the loop logic, we scan the original rtable
! * not the copy just made by adjust_appendrel_attrs; that should be OK
! * since subquery RTEs couldn't contain any references to the target
! * rel.
*/
! if (final_rtable != NIL)
{
ListCell *lr;
--- 993,1005 ----
/*
* If this isn't the first child Query, generate duplicates of all
! * subquery (or subquery-to-be) RTEs, and adjust Var numbering to
! * reference the duplicates. To simplify the loop logic, we scan the
! * original rtable not the copy just made by adjust_appendrel_attrs;
! * that should be OK since subquery RTEs couldn't contain any
! * references to the target rel.
*/
! if (final_rtable != NIL && !bms_is_empty(subqueryRTindexes))
{
ListCell *lr;
*************** inheritance_planner(PlannerInfo *root)
*** 948,961 ****
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr);
! /*
! * Copy subquery RTEs and RTEs with security barrier quals
! * that will be turned into subqueries, except those that are
! * target relations.
! */
! if (rte->rtekind == RTE_SUBQUERY ||
! (rte->securityQuals != NIL &&
! !bms_is_member(rti, resultRTindexes)))
{
Index newrti;
--- 1008,1014 ----
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr);
! if (bms_is_member(rti, subqueryRTindexes))
{
Index newrti;
On Sat, Jun 20, 2015 at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Meanwhile, here is an updated patch.
I don't care for that patch too much: it seems a bit brute-force, and I'm
quite worried by the assumption that it's okay to destroy each child's
append_rel_list after processing the child. That would fail if any of
the Vars/subexpressions in the lists get incorporated into the resulting
child Plan, which does not seem impossible. (I think in many cases we'd
do a copyObject() when extracting an append_rel_list expression, but this
hardly seems guaranteed.)
Well, if such a thing is possible, the regression tests don't catch it
- can we add one that would?
I propose instead the attached patch, which operates by identifying which
of the append_rel_list entries actually contain subquery references, and
copying only those; the other ones are just linked into the child's
append_rel_list by reference, which is okay because they won't get
modified.
I thought about that approach, but wasn't sure if I could make it
simple enough to pass muster. Note that I generally erred on the side
of deferring all work as long as possible and to the greatest extent
possible in a way that your patch does not. We don't need to compute
modifiableARIindexes if subqueryRTindexes ends up empty, and we
definitely don't need to generate O(n^2) list cells in that case. I
think that latter point, at least, is quite likely to be worth
optimizing. Granted, spewing out extra ListCells is far less harmful
than doing the same thing with AppendRelInfos and their entire list of
translated_vars, but it's still not good. Can't we move the loop that
copies root.append_rel_list inside if (final_rtable != NIL &&
!bms_is_empty(subqueryRTindexes))? If we don't take that branch,
root.append_rel_list isn't getting modified at all, so a shallow copy
is good enough.
On my laptop, I get the following timings for your test queries from
unmodified HEAD (--cassert build):# Q1: 41260.239 ms
# Q2: 45225.768 ms
# Q3: 43066.958 ms
# Q4: 193360.726 ms
# Q5: 40746.503 msand these with my patch:
# Q1: 1767.753 ms
# Q2: 3662.131 ms
# Q3: 814.293 ms
# Q4: 64468.914 ms
# Q5: 881.295 mswhich seems to be generally a better result.
Better than unpatched, definitely! Not sure how it compares to my patch.
The extraordinarily planning time for query 4 is caused by a
completely different problem: SearchCatCache eats up huge amounts of
CPU; its callers are get_attavgwidth and get_typlen. It's not clear
to me why doubling the number of relations causes such an enormous
explosion in calls to those functions; I would expect the number of
calls to double, but presumably the actual increase is much more.Actually, Q4 necessarily involves O(N^2) planning time, because for
each of N target relations you're considering a join to an N-member
inheritance tree. A lot of those ultimately get thrown away by
constraint exclusion, but not before we've expended significant
cycles on them. I do not think we are going to get much traction
on that --- even if we do something to knock off whatever the leading
term is, there'll still be more O(N^2) behavior right behind it.
Hmm, maybe so. On the other hand, if there's a way to significantly
shrink the constant factor on that O(N^2) stuff, it could bring a lot
of people some much-needed relief.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21 June 2015 at 05:27, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Jun 20, 2015 at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I propose instead the attached patch, which operates by identifying which
of the append_rel_list entries actually contain subquery references, and
copying only those; the other ones are just linked into the child's
append_rel_list by reference, which is okay because they won't get
modified.Better than unpatched, definitely! Not sure how it compares to my patch.
I tested on my machine (optimised build, asserts off). With HEAD I got:
Q1: 8076ms
Q2: 7165ms
Q3: 4027ms
Q4: OOM (killed by kernel, used > 16GB RAM)
Q5: 4131ms
The machine only has 16GB of RAM and almost no swap, so it wasn't able to do Q4.
With Robert's patch:
Q1: 1121ms
Q2: 542ms
Q3: 498ms
Q4: 50763ms (used 3GB RAM)
Q5: 556ms
and with Tom's patch:
Q1: 2264ms
Q2: 3785ms
Q3: 507ms
Q4: 50851ms (used 3GB RAM)
Q5: 558ms
However, there's an obvious improvement that can be made to Tom's
patch -- having computed modifiableARIindexes, you may as well use it
in the innermost loop to only apply ChangeVarNodes() to those
AppendRelInfo's that can actually change, rather than having it trawl
through all the other ones that we know won't be touched.
With that improvement (attached), the timings become:
Q1: 1148ms
Q2: 547ms
Q3: 505ms
Q4: 51325ms
Q5: 544ms
i.e., basically the same as Robert's patch.
Regards,
Dean
Attachments:
reduce-inheritance-planner-copying-v4.patchtext/x-diff; charset=US-ASCII; name=reduce-inheritance-planner-copying-v4.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
new file mode 100644
index 8afde2b..c8a3c65
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** inheritance_planner(PlannerInfo *root)
*** 834,840 ****
{
Query *parse = root->parse;
int parentRTindex = parse->resultRelation;
! Bitmapset *resultRTindexes = NULL;
int nominalRelation = -1;
List *final_rtable = NIL;
int save_rel_array_size = 0;
--- 834,842 ----
{
Query *parse = root->parse;
int parentRTindex = parse->resultRelation;
! Bitmapset *resultRTindexes;
! Bitmapset *subqueryRTindexes;
! Bitmapset *modifiableARIindexes;
int nominalRelation = -1;
List *final_rtable = NIL;
int save_rel_array_size = 0;
*************** inheritance_planner(PlannerInfo *root)
*** 845,850 ****
--- 847,853 ----
List *returningLists = NIL;
List *rowMarks;
ListCell *lc;
+ Index rti;
Assert(parse->commandType != CMD_INSERT);
*************** inheritance_planner(PlannerInfo *root)
*** 867,874 ****
* subqueries during planning, and so we must create copies of them too,
* except where they are target relations, which will each only be used in
* a single plan.
*/
! resultRTindexes = bms_add_member(resultRTindexes, parentRTindex);
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
--- 870,879 ----
* subqueries during planning, and so we must create copies of them too,
* except where they are target relations, which will each only be used in
* a single plan.
+ *
+ * To begin with, we'll need a bitmapset of the target relation relids.
*/
! resultRTindexes = bms_make_singleton(parentRTindex);
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
*************** inheritance_planner(PlannerInfo *root)
*** 878,889 ****
appinfo->child_relid);
}
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
PlannerInfo subroot;
Plan *subplan;
! Index rti;
/* append_rel_list contains all append rels; ignore others */
if (appinfo->parent_relid != parentRTindex)
--- 883,937 ----
appinfo->child_relid);
}
+ /*
+ * Now, generate a bitmapset of the relids of the subquery RTEs, including
+ * security-barrier RTEs that will become subqueries, as just explained.
+ */
+ subqueryRTindexes = NULL;
+ rti = 1;
+ foreach(lc, parse->rtable)
+ {
+ RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+
+ if (rte->rtekind == RTE_SUBQUERY ||
+ (rte->securityQuals != NIL &&
+ !bms_is_member(rti, resultRTindexes)))
+ subqueryRTindexes = bms_add_member(subqueryRTindexes, rti);
+ rti++;
+ }
+
+ /*
+ * Next, we want to identify which AppendRelInfo items contain references
+ * to any of the aforesaid subquery RTEs. These items will need to be
+ * copied and modified to adjust their subquery references; whereas the
+ * other ones need not be touched. It's worth being tense over this
+ * because we can usually avoid processing most of the AppendRelInfo
+ * items, thereby saving O(N^2) space and time when the target is a large
+ * inheritance tree. We can identify AppendRelInfo items by their
+ * child_relid, since that should be unique within the list.
+ */
+ modifiableARIindexes = NULL;
+ foreach(lc, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
+
+ if (bms_is_member(appinfo->parent_relid, subqueryRTindexes) ||
+ bms_is_member(appinfo->child_relid, subqueryRTindexes) ||
+ bms_overlap(pull_varnos((Node *) appinfo->translated_vars),
+ subqueryRTindexes))
+ modifiableARIindexes = bms_add_member(modifiableARIindexes,
+ appinfo->child_relid);
+ }
+
+ /*
+ * And now we can get on with generating a plan for each child table.
+ */
foreach(lc, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
PlannerInfo subroot;
Plan *subplan;
! ListCell *lc2;
/* append_rel_list contains all append rels; ignore others */
if (appinfo->parent_relid != parentRTindex)
*************** inheritance_planner(PlannerInfo *root)
*** 917,925 ****
/*
* The append_rel_list likewise might contain references to subquery
* RTEs (if any subqueries were flattenable UNION ALLs). So prepare
! * to apply ChangeVarNodes to that, too.
*/
! subroot.append_rel_list = (List *) copyObject(root->append_rel_list);
/*
* Add placeholders to the child Query's rangetable list to fill the
--- 965,985 ----
/*
* The append_rel_list likewise might contain references to subquery
* RTEs (if any subqueries were flattenable UNION ALLs). So prepare
! * to apply ChangeVarNodes to that, too. As explained above, we only
! * want to copy items that actually contain such references; the
! * rest can just get linked into the subroot's append_rel_list.
*/
! subroot.append_rel_list = NIL;
! foreach(lc2, root->append_rel_list)
! {
! AppendRelInfo *appinfo2 = (AppendRelInfo *) lfirst(lc2);
!
! if (bms_is_member(appinfo2->child_relid, modifiableARIindexes))
! appinfo2 = (AppendRelInfo *) copyObject(appinfo2);
!
! subroot.append_rel_list = lappend(subroot.append_rel_list,
! appinfo2);
! }
/*
* Add placeholders to the child Query's rangetable list to fill the
*************** inheritance_planner(PlannerInfo *root)
*** 933,945 ****
/*
* If this isn't the first child Query, generate duplicates of all
! * subquery RTEs, and adjust Var numbering to reference the
! * duplicates. To simplify the loop logic, we scan the original rtable
! * not the copy just made by adjust_appendrel_attrs; that should be OK
! * since subquery RTEs couldn't contain any references to the target
! * rel.
*/
! if (final_rtable != NIL)
{
ListCell *lr;
--- 993,1005 ----
/*
* If this isn't the first child Query, generate duplicates of all
! * subquery (or subquery-to-be) RTEs, and adjust Var numbering to
! * reference the duplicates. To simplify the loop logic, we scan the
! * original rtable not the copy just made by adjust_appendrel_attrs;
! * that should be OK since subquery RTEs couldn't contain any
! * references to the target rel.
*/
! if (final_rtable != NIL && !bms_is_empty(subqueryRTindexes))
{
ListCell *lr;
*************** inheritance_planner(PlannerInfo *root)
*** 948,961 ****
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr);
! /*
! * Copy subquery RTEs and RTEs with security barrier quals
! * that will be turned into subqueries, except those that are
! * target relations.
! */
! if (rte->rtekind == RTE_SUBQUERY ||
! (rte->securityQuals != NIL &&
! !bms_is_member(rti, resultRTindexes)))
{
Index newrti;
--- 1008,1014 ----
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(lr);
! if (bms_is_member(rti, subqueryRTindexes))
{
Index newrti;
*************** inheritance_planner(PlannerInfo *root)
*** 968,974 ****
newrti = list_length(subroot.parse->rtable) + 1;
ChangeVarNodes((Node *) subroot.parse, rti, newrti, 0);
ChangeVarNodes((Node *) subroot.rowMarks, rti, newrti, 0);
! ChangeVarNodes((Node *) subroot.append_rel_list, rti, newrti, 0);
rte = copyObject(rte);
ChangeVarNodes((Node *) rte->securityQuals, rti, newrti, 0);
subroot.parse->rtable = lappend(subroot.parse->rtable,
--- 1021,1033 ----
newrti = list_length(subroot.parse->rtable) + 1;
ChangeVarNodes((Node *) subroot.parse, rti, newrti, 0);
ChangeVarNodes((Node *) subroot.rowMarks, rti, newrti, 0);
! foreach(lc2, subroot.append_rel_list)
! {
! AppendRelInfo *appinfo2 = (AppendRelInfo *) lfirst(lc2);
!
! if (bms_is_member(appinfo2->child_relid, modifiableARIindexes))
! ChangeVarNodes((Node *) appinfo2, rti, newrti, 0);
! }
rte = copyObject(rte);
ChangeVarNodes((Node *) rte->securityQuals, rti, newrti, 0);
subroot.parse->rtable = lappend(subroot.parse->rtable,
On Sun, Jun 21, 2015 at 5:45 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On 21 June 2015 at 05:27, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Jun 20, 2015 at 6:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I propose instead the attached patch, which operates by identifying which
of the append_rel_list entries actually contain subquery references, and
copying only those; the other ones are just linked into the child's
append_rel_list by reference, which is okay because they won't get
modified.Better than unpatched, definitely! Not sure how it compares to my patch.
I tested on my machine (optimised build, asserts off). With HEAD I got:
Q1: 8076ms
Q2: 7165ms
Q3: 4027ms
Q4: OOM (killed by kernel, used > 16GB RAM)
Q5: 4131msThe machine only has 16GB of RAM and almost no swap, so it wasn't able to do Q4.
With Robert's patch:
Q1: 1121ms
Q2: 542ms
Q3: 498ms
Q4: 50763ms (used 3GB RAM)
Q5: 556msand with Tom's patch:
Q1: 2264ms
Q2: 3785ms
Q3: 507ms
Q4: 50851ms (used 3GB RAM)
Q5: 558msHowever, there's an obvious improvement that can be made to Tom's
patch -- having computed modifiableARIindexes, you may as well use it
in the innermost loop to only apply ChangeVarNodes() to those
AppendRelInfo's that can actually change, rather than having it trawl
through all the other ones that we know won't be touched.With that improvement (attached), the timings become:
Q1: 1148ms
Q2: 547ms
Q3: 505ms
Q4: 51325ms
Q5: 544msi.e., basically the same as Robert's patch.
Cool. That sounds good.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
However, there's an obvious improvement that can be made to Tom's
patch -- having computed modifiableARIindexes, you may as well use it
in the innermost loop to only apply ChangeVarNodes() to those
AppendRelInfo's that can actually change, rather than having it trawl
through all the other ones that we know won't be touched.
Right. Also, as Robert noted, we can short-circuit a few more things when
there are no subquery RTEs. I'll combine these ideas and push something,
but probably not till tomorrow.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers