Planning time grows exponentially with levels of nested views
Hi,
I assumed the cost for each nested VIEW layer would grow linear,
but my testing shows it appears to grow exponentially:
CREATE TABLE foo (bar int);
INSERT INTO foo (bar) VALUES (123);
DO $_$
DECLARE
BEGIN
CREATE OR REPLACE VIEW v1 AS SELECT * FROM foo;
FOR i IN 1..256 LOOP
EXECUTE format
(
$$
CREATE OR REPLACE VIEW v%s AS
SELECT * FROM v%s
$$,
i+1,
i
);
END LOOP;
END
$_$;
EXPLAIN ANALYZE SELECT * FROM foo;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.004..0.004 rows=1 loops=1)
Planning Time: 0.117 ms
Execution Time: 0.011 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v1;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.003 rows=1 loops=1)
Planning Time: 0.019 ms
Execution Time: 0.015 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v2;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.018 ms
Execution Time: 0.011 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v4;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.030 ms
Execution Time: 0.013 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v8;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.061 ms
Execution Time: 0.016 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v16;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.003 rows=1 loops=1)
Planning Time: 0.347 ms
Execution Time: 0.027 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v32;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.003 rows=1 loops=1)
Planning Time: 2.096 ms
Execution Time: 0.044 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v64;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.004..0.005 rows=1 loops=1)
Planning Time: 14.981 ms
Execution Time: 0.119 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v128;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.004..0.004 rows=1 loops=1)
Planning Time: 109.407 ms
Execution Time: 0.187 ms
(3 rows)
EXPLAIN ANALYZE SELECT * FROM v256;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.007 rows=1 loops=1)
Planning Time: 1594.809 ms
Execution Time: 0.531 ms
(3 rows)
[ redirecting to -hackers so the cfbot can see it ]
"Joel Jacobson" <joel@compiler.org> writes:
I assumed the cost for each nested VIEW layer would grow linear,
but my testing shows it appears to grow exponentially:
I think it's impossible to avoid less-than-O(N^2) growth on this sort
of case. For example, the v2 subquery initially has RTEs for v2 itself
plus v1. When we flatten v1 into v2, v2 acquires the RTEs from v1,
namely v1 itself plus foo. Similarly, once vK-1 is pulled up into vK,
there are going to be order-of-K entries in vK's rtable, and that stacking
makes for O(N^2) work overall just in manipulating the rtable.
We can't get rid of these rtable entries altogether, since all of them
represent table privilege checks that the executor will need to do.
It occurs to me though that we don't need the rte->subquery trees anymore
once those are flattened, so maybe we could do something like the
attached. For me, this reduces the slowdown in your example from
O(N^3) to O(N^2).
I'm slightly worried though by this comment earlier in
pull_up_simple_subquery:
/*
* Need a modifiable copy of the subquery to hack on. Even if we didn't
* sometimes choose not to pull up below, we must do this to avoid
* problems if the same subquery is referenced from multiple jointree
* items (which can't happen normally, but might after rule rewriting).
*/
If multiple references are actually possible then this'd break it. There
seem to be no such cases in the regression tests though, and I'm having a
hard time wrapping my brain around what would cause it. "git blame"
traces this text to my own commit f44639e1b, which has the log entry
Don't crash if subquery appears multiple times in jointree. This should
not happen anyway, but let's try not to get completely confused if it does
(due to rewriter bugs or whatever).
so I'm thinking that this was only hypothetical.
It's possible that we could do something similar in the sibling functions
pull_up_simple_union_all etc, but I'm not sure it's worth troubling over.
TBH, for the size of effect you're showing here, I wouldn't be worried
at all; it's only because it seems to be a one-liner to improve it that
I'm interested in doing something.
regards, tom lane
Attachments:
save-cycles-in-repeated-subquery-pullup-1.patchtext/x-diff; charset=us-ascii; name=save-cycles-in-repeated-subquery-pullup-1.patchDownload
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 62a1668796..f93c037778 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -1174,6 +1174,14 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
Assert(root->placeholder_list == NIL);
Assert(subroot->placeholder_list == NIL);
+ /*
+ * We no longer need the RTE's copy of the subquery's query tree. Getting
+ * rid of it saves nothing in particular so far as this level of query is
+ * concerned; but if this query level is in turn pulled up into a parent,
+ * we'd waste cycles copying the now-unused query tree.
+ */
+ rte->subquery = NULL;
+
/*
* Miscellaneous housekeeping.
*
I wrote:
If multiple references are actually possible then this'd break it. There
seem to be no such cases in the regression tests though, and I'm having a
hard time wrapping my brain around what would cause it. "git blame"
traces this text to my own commit f44639e1b, which has the log entry
Don't crash if subquery appears multiple times in jointree. This should
not happen anyway, but let's try not to get completely confused if it does
(due to rewriter bugs or whatever).
so I'm thinking that this was only hypothetical.
Ah, found it. That was evidently a reaction to the immediately preceding
commit (352871ac9), which fixed a rewriter bug that could lead to exactly
the case of multiple jointree references to one RTE.
I think this patch doesn't make things any worse for such a case though.
If we re-introduced such a bug, the result would be an immediate null
pointer crash while trying to process the second reference to a
flattenable subquery. That's probably better for debuggability than
what happens now, where we just silently process the duplicate reference.
Anyway, I've stuck this into the next CF for future consideration.
regards, tom lane
PS: to save time for anyone else who wants to investigate this,
it looks like the report mentioned in 352871ac9 was
On Sun, Apr 18, 2021, at 22:14, Tom Lane wrote:
"Joel Jacobson" <joel@compiler.org <mailto:joel%40compiler.org>> writes:
I assumed the cost for each nested VIEW layer would grow linear,
but my testing shows it appears to grow exponentially:I think it's impossible to avoid less-than-O(N^2) growth on this sort
of case. For example, the v2 subquery initially has RTEs for v2 itself
plus v1. When we flatten v1 into v2, v2 acquires the RTEs from v1,
namely v1 itself plus foo. Similarly, once vK-1 is pulled up into vK,
there are going to be order-of-K entries in vK's rtable, and that stacking
makes for O(N^2) work overall just in manipulating the rtable.We can't get rid of these rtable entries altogether, since all of them
represent table privilege checks that the executor will need to do.
It occurs to me though that we don't need the rte->subquery trees anymore
once those are flattened, so maybe we could do something like the
attached. For me, this reduces the slowdown in your example from
O(N^3) to O(N^2).
Many thanks for explaining and the patch.
I've tested the patch successfully.
Planning time grows much slower now:
EXPLAIN ANALYZE SELECT * FROM v64;
- Planning Time: 14.981 ms
+ Planning Time: 2.802 ms
EXPLAIN ANALYZE SELECT * FROM v128;
- Planning Time: 109.407 ms
+ Planning Time: 11.595 ms
EXPLAIN ANALYZE SELECT * FROM v256;
- Planning Time: 1594.809 ms
+ Planning Time: 46.709 ms
Very nice.
/Joel
On Sun, 18 Apr 2021 at 21:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
If multiple references are actually possible then this'd break it.
I think this patch doesn't make things any worse for such a case though.
If we re-introduced such a bug, the result would be an immediate null
pointer crash while trying to process the second reference to a
flattenable subquery. That's probably better for debuggability than
what happens now, where we just silently process the duplicate reference.
I took a look at this and wasn't able to find any way to break it, and
your argument that it can't really make such rewriter bugs any worse
makes sense.
Would it make sense to update the comment prior to copying the subquery?
Out of curiosity, I also tested DML against these deeply nested views
to see how the pull-up code in the rewriter compares in terms of
performance, since it does a very similar job. As expected, it's
O(N^2) as well, but it's about an order of magnitude faster:
(times to run a plain EXPLAIN in ms, with patch)
| SELECT | INSERT | UPDATE | DELETE
-----+--------+--------+--------+--------
v64 | 1.259 | 0.189 | 0.250 | 0.187
v128 | 5.035 | 0.506 | 0.547 | 0.509
v256 | 20.393 | 1.633 | 1.607 | 1.638
v512 | 81.101 | 6.649 | 6.517 | 6.699
Maybe that's not surprising, since there's less to do at that stage.
Anyway, it's reassuring to know that it copes OK with this (I've seen
some quite deeply nested views in practice, but never that deep).
For comparison, this is what SELECT performance looked like for me
without the patch:
| SELECT
-----+----------
v64 | 9.589
v128 | 73.292
v256 | 826.964
v512 | 7844.419
so, for a one-line change, that's pretty impressive.
Regards,
Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
I took a look at this and wasn't able to find any way to break it, and
your argument that it can't really make such rewriter bugs any worse
makes sense.
Thanks for looking!
Would it make sense to update the comment prior to copying the subquery?
Yeah, I hadn't touched that yet because the question was exactly about
whether it's correct or not. I think we can shorten it to
* Need a modifiable copy of the subquery to hack on, so that the
* RTE can be left unchanged in case we decide below that we can't
* pull it up after all.
Out of curiosity, I also tested DML against these deeply nested views
to see how the pull-up code in the rewriter compares in terms of
performance, since it does a very similar job. As expected, it's
O(N^2) as well, but it's about an order of magnitude faster:
Oh good. I hadn't thought to look at that angle of things.
... for a one-line change, that's pretty impressive.
Yeah. The problem might get less bad if we get anywhere with the
idea I suggested at [1]/messages/by-id/697679.1625154303@sss.pgh.pa.us. If we can reduce the number of RTEs
in a view's query, then copying it would get cheaper. Still,
not copying it at all is always going to be better. I'll go
ahead and push the patch.
regards, tom lane