Clamping reulst row number of joins.
Hello, I had a report that a query gets wired estimated row
numbers and it makes subsequent plans wrong.
Although the detailed explanation is shown later in this mail,
the problem here was that a query like following makes the path
with apparently wrong row number.
EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
JOIN (VALUES (1)) tt(x) ON tt.x = t.a;
0> Nested Loop (cost=0.43..8.51 rows=68732 width=4)
1> -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
2> -> Append (cost=0.43..8.48 rows=2 width=4)
3> -> Index Only Scan using i_t1_a on t1
4> (cost=0.43..8.46 rows=1 width=4)
5> Index Cond: (a = "*VALUES*".column1)
6> -> Subquery Scan on "*SELECT* 2" (cost=0.00..0.02 rows=1 width=4)
7> Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
8> -> Result (cost=0.00..0.01 rows=1 width=0)
The Nested Loop at line 0 says that it emits 68832 rows even
though the underlying nodes, Values at line 1 and Append at line
2 are giving 1 and 2 respectively as their reults. This query
actually returns only 1 row. It is seemingly wrong and causes the
plans above go wrong.
This is caused by the Append node, which don't have statistics,
so eqjoinsel takes 0.5 as the default selectivity for the nested
loop. Even though it is defficult to get statistics for Append
node, the nested loop could get more sane row nubmer if it
doesn't ignore the parameterized path selected for the scan on
t1.
I suppose that join nodes can safely clamp the number of its
result rows by the product of the row numbers of underlying two
paths. And if it is correct, the attached patch can correct the
estimation like this.
Nested Loop (cost=0.43..16.51 rows=2 width=4)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
-> Append (cost=0.43..16.48 rows=2 width=4)
-> Index Only Scan using i_t1_a on t1
(cost=0.43..16.45 rows=1 width=4)
Index Cond: (a = "*VALUES*".column1)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..0.02 rows=1 width=4)
Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
-> Result (cost=0.00..0.01 rows=1 width=0)
Is it correct? And Does it have no bad side effects?
And is this applicable for 9.5?
The detailed test environment is described below.
Before this fix, the inner path of the top-level join is doing
hash because the inner path says that it will give 68732
rows. (But only 1 actually).
After this fix, the outer path declaring that it gives 2 rows so
the top-level join becomes nested loop and the inner path becomes
index scan.
=======
CREATE TABLE t1 (a int);
INSERT INTO t1 (SELECT (a%2)*500000 + a / 2 FROM generate_series(0, 1000000) a);
CREATE INDEX i_t1_a ON t1 (a);
CREATE TABLE t2 AS SELECT * from t1;
ANALYZE t1;
ANALYZE t2;
-- emulating the problematic situation
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;
EXPLAIN
SELECT tt2.a
FROM (SELECT a
FROM (SELECT a FROM t1
UNION ALL SELECT 0) t
JOIN (VALUES (1)) tt(x) ON tt.x = t.a) tt2
JOIN t2 ON (tt2.a = t2.a);
======== The plan before fix
Hash Join (cost=30832.46..36230.60 rows=68732 width=4)
(actual time=1258.880..1260.519 rows=1 loops=1)
Hash Cond: ("*VALUES*".column1 = t2.a)
-> Nested Loop (cost=0.43..8.51 rows=68732 width=8)
(actual time=0.071..0.104 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
(actual time=0.002..0.003 rows=1 loops=1)
-> Append (cost=0.43..8.48 rows=2 width=4)
(actual time=0.063..0.093 rows=1 loops=1)
-> Index Only Scan using i_t1_a on t1
(cost=0.43..8.46 rows=1 width=4)
(actual time=0.059..0.075 rows=1 loops=1)
Index Cond: (a = "*VALUES*".column1)
Heap Fetches: 2
-> Subquery Scan on "*SELECT* 2"
(cost=0.00..0.02 rows=1 width=4)
(actual time=0.010..0.010 rows=0 loops=1)
Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
Rows Removed by Filter: 1
-> Result (cost=0.00..0.01 rows=1 width=0)
(actual time=0.001..0.002 rows=1 loops=1)
-> Hash (cost=14425.01..14425.01 rows=1000001 width=4)
(actual time=1250.274..1250.274 rows=1000001 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3227kB
-> Seq Scan on t2 (cost=0.00..14425.01 rows=1000001 width=4)
(actual time=0.021..608.245 rows=1000001 loops=1)
Planning time: 0.319 ms
Execution time: 1260.792 ms
======== The plan after fix
Nested Loop (cost=0.86..17.43 rows=2 width=4)
(actual time=0.103..0.118 rows=1 loops=1)
Join Filter: ("*VALUES*".column1 = t2.a)
-> Nested Loop (cost=0.43..16.51 rows=2 width=8)
(actual time=0.062..0.073 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
(actual time=0.003..0.004 rows=1 loops=1)
-> Append (cost=0.43..16.48 rows=2 width=4)
(actual time=0.051..0.060 rows=1 loops=1)
-> Index Only Scan using i_t1_a on t1
(cost=0.43..16.45 rows=1 width=4)
(actual time=0.045..0.046 rows=1 loops=1)
Index Cond: (a = "*VALUES*".column1)
Heap Fetches: 1
-> Subquery Scan on "*SELECT* 2"
(cost=0.00..0.02 rows=1 width=4)
(actual time=0.005..0.005 rows=0 loops=1)
Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
Rows Removed by Filter: 1
-> Result (cost=0.00..0.01 rows=1 width=0)
(actual time=0.002..0.002 rows=1 loops=1)
-> Index Only Scan using i_t2_a on t2 (cost=0.42..0.45 rows=1 width=4)
(actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (a = t1.a)
Heap Fetches: 1
Planning time: 0.968 ms
Execution time: 0.239 ms
=========
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
Hello, I had a report that a query gets wired estimated row
numbers and it makes subsequent plans wrong.
Although the detailed explanation is shown later in this mail,
the problem here was that a query like following makes the path
with apparently wrong row number.
EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
JOIN (VALUES (1)) tt(x) ON tt.x = t.a;
Hm, the problem evidently is that we get a default selectivity estimate
for the ON condition. I think a proper fix for this would involve
teaching eqjoinsel (and ideally other join selectivity functions) how
to drill down into appendrels and combine estimates for the child
relations.
I suppose that join nodes can safely clamp the number of its
result rows by the product of the row numbers of underlying two
paths. And if it is correct, the attached patch can correct the
estimation like this.
Unfortunately, this patch is completely horrid, because it gives an unfair
advantage to the parameterized-nestloop plan. (The hacks in the other
two functions are no-ops, because only a nestloop plan will have a
parameterized path for the appendrel.) What's more, it's only a cosmetic
fix: it won't correct the planner's actual idea of the joinrel size, which
means it won't have an impact on planning any additional levels of
joining.
We really need to fix this on the selectivity-estimate side.
BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
just use it to make a compact example? If it were something worth
optimizing, it seems like we could teach the planner to "pull up VALUES"
in the same way that it flattens sub-selects. I'm not sure if this is
worth the trouble or not, though.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
just use it to make a compact example? If it were something worth
optimizing, it seems like we could teach the planner to "pull up VALUES"
in the same way that it flattens sub-selects. I'm not sure if this is
worth the trouble or not, though.
I've certainly seen and used values() constructs in joins for a variety
of reasons and I do think it'd be worthwhile for the planner to know how
to pull up a VALUES construct.
Would that be a lot of effort, either code-wise or runtime-wise? My gut
feeling is that it wouldn't be, but you're clearly in a much better
position to determine that.
Thanks!
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
just use it to make a compact example? If it were something worth
optimizing, it seems like we could teach the planner to "pull up VALUES"
in the same way that it flattens sub-selects. I'm not sure if this is
worth the trouble or not, though.
I've certainly seen and used values() constructs in joins for a variety
of reasons and I do think it'd be worthwhile for the planner to know how
to pull up a VALUES construct.
Would that be a lot of effort, either code-wise or runtime-wise? My gut
feeling is that it wouldn't be, but you're clearly in a much better
position to determine that.
My guess is that it'd be pretty simple to do if we want to do it.
I've not looked at the code yet though.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
just use it to make a compact example? If it were something worth
optimizing, it seems like we could teach the planner to "pull up VALUES"
in the same way that it flattens sub-selects. I'm not sure if this is
worth the trouble or not, though.
I've certainly seen and used values() constructs in joins for a variety
of reasons and I do think it'd be worthwhile for the planner to know how
to pull up a VALUES construct.
Would that be a lot of effort, either code-wise or runtime-wise? My gut
feeling is that it wouldn't be, but you're clearly in a much better
position to determine that.
My guess is that it'd be pretty simple to do if we want to do it.
I've not looked at the code yet though.
I spent a bit of time looking at this, and realized that the blocker
is the same as the reason why we don't pull up sub-selects with empty
rangetables ("SELECT expression(s)"). Namely, that the upper query's
jointree can't handle a null subtree. (This is not only a matter of
the jointree data structure, but the fact that the whole planner
identifies relations by bitmapsets of RTE indexes, and subtrees with
empty RTE sets couldn't be told apart.)
We could probably fix both cases for order-of-a-hundred lines of new code
in prepjointree. The plan I'm thinking about is to allow such vacuous
subquery jointrees to be pulled up, but only if they are in a place in
the upper query's jointree where it's okay to delete the subtree. This
would basically be two cases: (1) the immediate parent is a FromExpr that
would have at least one remaining child, or (2) the immediate parent is
an INNER JOIN whose other child isn't also being deleted (so that we can
convert the JoinExpr to a nonempty FromExpr, or just use the other child
as-is if the JoinExpr has no quals).
More generally, it occurs to me that maybe Oracle wasn't being totally
silly when they invented DUAL. If we had a jointree representation for
"dummy relation with exactly one row" then we could substitute that in
all vacuous-jointree cases. However, it's not clear that there's any
functional advantage to replacing a VALUES Scan with a "DUAL Scan",
which is basically what would happen if we flattened a VALUES and then
had to put one of these things in instead. And having such things
propagate all through the planner, executor, EXPLAIN, etc is way more
code churn than I want to contemplate right now. The plan proposed in
the preceding para is a bit more ugly logically, but it would limit the
code effects to basically pull_up_subqueries() and its child routines.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
Hm, the problem evidently is that we get a default selectivity estimate
for the ON condition. I think a proper fix for this would involve
teaching eqjoinsel (and ideally other join selectivity functions) how
to drill down into appendrels and combine estimates for the child
relations.
I wrote a prototype patch for this. The additions to examine_variable()
seem pretty reasonable. However, the only selectivity function I've fixed
is eqjoinsel_inner(). If we do it like this, we're going to need
similar recursive-boilerplate additions in basically every selectivity
function, which seems like a PITA as well as a lot of code bloat.
Can anyone think of a cute way to minimize that overhead?
regards, tom lane
Attachments:
I wrote:
Stephen Frost <sfrost@snowman.net> writes:
I've certainly seen and used values() constructs in joins for a variety
of reasons and I do think it'd be worthwhile for the planner to know how
to pull up a VALUES construct.
I spent a bit of time looking at this, and realized that the blocker
is the same as the reason why we don't pull up sub-selects with empty
rangetables ("SELECT expression(s)"). Namely, that the upper query's
jointree can't handle a null subtree. (This is not only a matter of
the jointree data structure, but the fact that the whole planner
identifies relations by bitmapsets of RTE indexes, and subtrees with
empty RTE sets couldn't be told apart.)
We could probably fix both cases for order-of-a-hundred lines of new code
in prepjointree. The plan I'm thinking about is to allow such vacuous
subquery jointrees to be pulled up, but only if they are in a place in
the upper query's jointree where it's okay to delete the subtree. This
would basically be two cases: (1) the immediate parent is a FromExpr that
would have at least one remaining child, or (2) the immediate parent is
an INNER JOIN whose other child isn't also being deleted (so that we can
convert the JoinExpr to a nonempty FromExpr, or just use the other child
as-is if the JoinExpr has no quals).
Here's a draft patch along those lines. Unsurprisingly, it changes the
plans generated for a number of regression-test queries. In most cases
I felt it desirable to force the old plan shape to be retained (by
inserting "offset 0" or equivalent) because the test was trying to test
proper generation of a query plan of that shape. I did add a couple
cases where the optimization was allowed to go through.
The patch is a bit bigger than I'd hoped (a net of about 330 lines added
to prepjointree.c), but it's not hugely ugly, and it doesn't add any
meaningful overhead in cases where no optimization happens. Barring
objections I will commit this.
regards, tom lane
Attachments:
Hello, thank you for the considerations by all of you, but I have
described incorrectly the situation. I'm terribly sorry for the
imprecise description and for your additional work based on it.
The point of this issue is not VAULES but the nature of Append
and NestLoop. Nested Loop can fail to have correct estimation
when it contains Apeend node on insufficiently analyzed
inhertance parent or UNION ALL. The latter is not saved by Tom's
patch nor proper analyzing.
Let me try to redescribe the issue.
At Sun, 08 Mar 2015 19:30:36 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <21519.1425857436@sss.pgh.pa.us>
I wrote:
Stephen Frost <sfrost@snowman.net> writes:
I've certainly seen and used values() constructs in joins for a variety
of reasons and I do think it'd be worthwhile for the planner to know how
to pull up a VALUES construct.
The query reported to me didn't included VALUES. I implicitly
used it as an shortcut to make Append node that cannot retrive
statistics on itself. It was the main cause of this confision.
Addition to that, there was somehow no statistics on the
inheritance parent (not as a child), which I didn't mentioned. I
don't know how the situation was made but I can guess manual
analyzes as a cause.
After all, the exact situation will emerge by following steps.
CREATE TABLE p (a int);
CREATE TABLE c1 (like p) INHERITS (p);
INSERT INTO c1 (SELECT (a%2)*500000 + a / 2 FROM generate_series(0, 1000000) a);
CREATE INDEX i_c1_a on c1 (a);
CREATE TABLE t1 (a int);
INSERT INTO t1 VALUES (1000000);
DELETE FROM pg_statistics;
ANALYZE t1;
ANALYZE c1;
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;
EXPLAIN analyzE SELECT a FROM p t WHERE a IN (select a from t1);
-----------------------------------------------------------------
Nested Loop (cost=1.01..9.49 rows=500001 width=4)
(actual time=0.041..0.041 rows=0 loops=1)
-> HashAggregate (cost=1.01..1.02 rows=1 width=4)
(actual time=0.017..0.017 rows=1 loops=1)
Group Key: t1.a
-> Seq Scan on t1 (cost=0.00..1.01 rows=1 width=4)
(actual time=0.007..0.009 rows=1 loops=1)
-> Append (cost=0.00..8.45 rows=2 width=4)
(actual time=0.018..0.018 rows=0 loops=1)
-> Seq Scan on p t (cost=0.00..0.00 rows=1 width=4)
(actual time=0.000..0.000 rows=0 loops=1)
Filter: (t1.a = a)
-> Index Only Scan using i_c1_a on c1 t_1
(cost=0.42..8.45 rows=1 width=4)
(actual time=0.012..0.012 rows=0 loops=1)
Index Cond: (a = t1.a)
Heap Fetches: 0
Planning time: 0.408 ms
Execution time: 0.135 ms
(12 rows)
The nested loop above decided that rows to be 500001 because
seleqfunc_semi couldn't get MVC list for p (as the parent) and
returns 0.5 as the default join selectivity.
Therefore, ANALYZE p makes this work sane for the case.
ANALYZE p;
EXPLAIN analyzE SELECT a FROM p t WHERE a IN (select a from t1);
-----------------------------------------------------------------
Nested Loop (cost=1.01..9.49 rows=1 width=4)
I thought that the case of apparently wrong row numbers would be
saved by clampling the number of result rows as I mentioned but
since it is saved by the proper analyzes, the necessity is rather
low if analyze works always. Still, the following example cannot
be saved even by analyze.
CREATE TABLE c2 (LIKE p);
ANALYZE p;
EXPLAIN analyzE select a FROM (SELECT a FROM c1 UNION ALL SELECT a from c2) t
WHERE a IN (select a from t1);
QUERY PLAN
---------------------------------------------------------------------------
Nested Loop (cost=1.44..51.48 rows=501276 width=4)
(actual time=0.046..0.046 rows=0 loops=1)
-> HashAggregate (cost=1.01..1.02 rows=1 width=4)
(actual time=0.018..0.019 rows=1 loops=1)
Group Key: t1.a
-> Seq Scan on t1 (cost=0.00..1.01 rows=1 width=4)
(actual time=0.008..0.010 rows=1 loops=1)
-> Append (cost=0.42..50.32 rows=14 width=4)
(actual time=0.023..0.023 rows=0 loops=1)
-> Index Only Scan using i_c1_a on c1
(cost=0.42..8.45 rows=1 width=4)
(actual time=0.016..0.016 rows=0 loops=1)
Index Cond: (a = t1.a)
Heap Fetches: 0
-> Seq Scan on c2 (cost=0.00..41.88 rows=13 width=4)
(actual time=0.001..0.001 rows=0 loops=1)
Filter: (t1.a = a)
Planning time: 0.384 ms
Execution time: 0.185 ms
(12 rows)
14 * 1 = 501276... Of course this cannot be saved by the
values-flatten patch unfortunately...
I think this is a not so rare case..
==============================================================
I spent a bit of time looking at this, and realized that the blocker
is the same as the reason why we don't pull up sub-selects with empty
rangetables ("SELECT expression(s)"). Namely, that the upper query's
jointree can't handle a null subtree. (This is not only a matter of
the jointree data structure, but the fact that the whole planner
identifies relations by bitmapsets of RTE indexes, and subtrees with
empty RTE sets couldn't be told apart.)We could probably fix both cases for order-of-a-hundred lines of new code
in prepjointree. The plan I'm thinking about is to allow such vacuous
subquery jointrees to be pulled up, but only if they are in a place in
the upper query's jointree where it's okay to delete the subtree. This
would basically be two cases: (1) the immediate parent is a FromExpr that
would have at least one remaining child, or (2) the immediate parent is
an INNER JOIN whose other child isn't also being deleted (so that we can
convert the JoinExpr to a nonempty FromExpr, or just use the other child
as-is if the JoinExpr has no quals).Here's a draft patch along those lines. Unsurprisingly, it changes the
plans generated for a number of regression-test queries. In most cases
I felt it desirable to force the old plan shape to be retained (by
inserting "offset 0" or equivalent) because the test was trying to test
proper generation of a query plan of that shape. I did add a couple
cases where the optimization was allowed to go through.The patch is a bit bigger than I'd hoped (a net of about 330 lines added
to prepjointree.c), but it's not hugely ugly, and it doesn't add any
meaningful overhead in cases where no optimization happens. Barring
objections I will commit this.
However, I feel I'm in charge of looking this:) as far as I can.
The patch removes inner joins which contain a child which can be
treated as a value converting the join condition into a simple(?)
qual. The intension looks (as far as I can see) ok and to work as
intended.
It works for very limited simple cases, inner join with VALUES
with one values list, inner join with subselect without from list
and having stable tlist. As such, my first example,
EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
JOIN (VALUES (1)) tt(x) ON tt.x = t.a;
...
Append (cost=0.00..1.01 rows=1 width=4)
-> Seq Scan on t1 (cost=0.00..1.01 rows=1 width=4)
Filter: (1 = a)
(3 rows)
is the one which is affected. The parallel example of subselect
is,
EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
JOIN (SELECT int8in('0')) tt(x) ON tt.x = t.a;
I found nothing to be commented in the code.
I dont have a solid opinion on the balance between the usability
and the code size, but there's one concern about planning time.
By this patch, planner scanns the whole jointree once always, and
more once if deletable subqueries. But I don't understand it's
worth while for the gain or not..
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers