Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Started by Dmytro Astapovabout 5 years ago42 messageshackers

dastapov@gmail.com

about 5 years ago

Hi!
I am trying to understand the behaviour of the query planner regarding the
push-down of the conditions "through" the join.

Lets say that I have tables a(adate date, aval text) and b(bdate date, bval
text), and I create a view:

create view v as
select a.adate, a.aval, b.bval from a join b on (a.adate = b.bdate);

Now, when I do (explain select * from v where adate='2021-05-12') I can see
that condition (= '2021-05-12') is used by the planned for table access to
both a and b.

However, if I use range-like condition (this is probably not a correct
terminology, but I am not familiar with the correct one) like BETWEEN or
(>='2021-05-21'), I will see that planner will use this condition to access
a, but not b. It seems that the type of join (inner or left) does not
really matter.

DB fiddle that illustrates this;
https://www.db-fiddle.com/f/pT2PwUkhJWuX9skWiBWXoL/0

In my experiments, I was never able to get an execution plan that "pushes
down" any condition apart from (=) through to the right side of the join,
which is rather surprising and leads to suboptimal planner estimates and
execution plans whenever view like the above is a part of a bigger query
with more joins on top.

Equally surprising is that I was unable to find documentation or past
mailing list discussions of this or similar topic, which leads me to
believe that I am just not familiar with the proper terminology and can't
come up with the right search terms.

Can you please tell me what is the proper way to describe this
behaviour/phenomenon (so that I can use it as search terms) and/or provide
me with references to the parts of the source code that determines which
conditions would be "pushed down" and which are not?

PS As far as I can see, this behaviour is consistent between versions 9.5,
10, 11, 12 and 13.

--
D. Astapov

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Dmytro Astapov (#1)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Dmitry Astapov <dastapov@gmail.com> writes:

I am trying to understand the behaviour of the query planner regarding the
push-down of the conditions "through" the join.

I think your mental model is wrong. What's actually happening here is
that the planner uses equivalence classes to deduce implied conditions.
That is, we have the join condition a.adate = b.bdate and then you've
added the where condition a.adate = '2021-05-12'. Transitivity implies
that b.bdate = '2021-05-12', so we deduce that condition and are able
to apply it at the relation scan of b. Furthermore, having restricted
both a.adate and b.bdate to the same constant value at the scan level,
we no longer need to apply the join condition a.adate = b.bdate at all.
This is important not only to avoid the (probably minor) inefficiency
of rechecking the join condition, but because if we believed that all
three conditions were independently applicable, we'd come out with a
serious underestimate of the size of the join result.

In my experiments, I was never able to get an execution plan that "pushes
down" any condition apart from (=) through to the right side of the join,

None of the argument sketched above works for non-equality conditions.
There are some situations where you could probably figure out how to
use transitivity to deduce some implied condition, but cleaning things
up so that you don't have redundant conditions fouling up the join
size estimates seems like a hard problem.

Another issue is that we could easily expend a lot of cycles on deductions
that lead nowhere, because once you try to open up the mechanism to
consider operators other than equality, there will be a lot of things that
it looks at and then fails to do anything with. The equivalence class
mechanism is tied into the same logic that considers merge and hash joins,
so we are expending lots of cycles anytime we see an equality operator,
and not so much for other operators.

Equally surprising is that I was unable to find documentation or past
mailing list discussions of this or similar topic, which leads me to
believe that I am just not familiar with the proper terminology and can't
come up with the right search terms.

src/backend/optimizer/README has a discussion of equivalence classes.

regards, tom lane

dastapov@gmail.com

about 5 years ago

In reply to: Tom Lane (#2)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On Wed, May 12, 2021 at 4:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dmitry Astapov <dastapov@gmail.com> writes:

I am trying to understand the behaviour of the query planner regarding

the

push-down of the conditions "through" the join.

I think your mental model is wrong. What's actually happening here is
that the planner uses equivalence classes to deduce implied conditions.
That is, we have the join condition a.adate = b.bdate and then you've
added the where condition a.adate = '2021-05-12'. Transitivity implies
that b.bdate = '2021-05-12', so we deduce that condition and are able
to apply it at the relation scan of b. Furthermore, having restricted
both a.adate and b.bdate to the same constant value at the scan level,
we no longer need to apply the join condition a.adate = b.bdate at all.
This is important not only to avoid the (probably minor) inefficiency
of rechecking the join condition, but because if we believed that all
three conditions were independently applicable, we'd come out with a
serious underestimate of the size of the join result.

Thank you very much, my mental model was indeed incorrect, and the above is
very helpful.
Am I right in thinking that elimination the join condition is actually
quite important part of the process?
Could it possibly be the main reason for =ANY/(x IN (..)) not to be
optimized the same way?

In my experiments, I was never able to get an execution plan that "pushes
down" any condition apart from (=) through to the right side of the join,

None of the argument sketched above works for non-equality conditions.
There are some situations where you could probably figure out how to
use transitivity to deduce some implied condition, but cleaning things
up so that you don't have redundant conditions fouling up the join
size estimates seems like a hard problem.

I agree about inequality conditions, this problem seems to be rather hard
to tackle in the general case.

Is it still hard when one thinks about =ANY or (column in (val1, val2,
val3, ...)) as well?
I am thinking that =ANY would be a decent workaround for (x BETWEEN a AND
b) in quite a lot of cases, if it was propagated to all the columns in the
equivalence class.

Equally surprising is that I was unable to find documentation or past
mailing list discussions of this or similar topic, which leads me to
believe that I am just not familiar with the proper terminology and can't
come up with the right search terms.

src/backend/optimizer/README has a discussion of equivalence classes.

Thank you, this gives me a plethora of keywords for further searches.

I realize that it is possibly off-topic here, but what about workarounds
for inequality constraints, joins and views? Maybe you could give me some
pointers here as well?

My tables are large to huge (think OLAP, not OLTP). I found out when I have
a view that joins several (2 to 10) tables on the column that is
semantically the same in all of them (let's say it is ID and we join on
ID), I do not have many avenues to efficiently select from such view for a
list of IDs at the same time.

I could:
1) Do lots of fast queries and union them:
select * from vw where id=ID1 union all select * from vw where id=ID2
....., which is only really feasible if the query is generated by the
program

2)expose all ID columns from all the tables used in the view body and do:
select * from vw where id=ANY() and id1=ANY() and id2=ANY() and id3=ANY()
.....
This only works well if the view hierarchy is flat (no views on views). If
there are other views that use this use, re-exports of extra columns
quickly snowballs, you might need column renaming if same view ends up
being used more than once through two different dependency paths. Plus
people not familiar with the problem tend to omit "clearly superfluous"
columns from the new views they build on top.

3)forbid views that join tables larger than a certain size/dismantle views
that become inefficient (this only works if the problem is detected fast
enough and the view did not become popular yet)

So all of the workarounds I see in front of me right now are somewhat sad,
but they are necessary, as not doing them means that queries would take
hours or days instead of minutes.

Is there anything better that I have not considered in terms of workarounds?

--
D. Astapov

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Dmytro Astapov (#3)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Dmitry Astapov <dastapov@gmail.com> writes:

Am I right in thinking that elimination the join condition is actually
quite important part of the process?
Could it possibly be the main reason for =ANY/(x IN (..)) not to be
optimized the same way?

Yup.

Is it still hard when one thinks about =ANY or (column in (val1, val2,
val3, ...)) as well?

Yeah. For instance, if you have
WHERE a = b AND a IN (1,2,3)
then yes, you could deduce "b IN (1,2,3)", but this would not give you
license to drop the "a = b" condition. So now you have to figure out
what the selectivity of that is after the application of the partially
redundant IN clauses.

I recall somebody (David Rowley, maybe? Too lazy to check archives.)
working on this idea awhile ago, but he didn't get to the point of
a committable patch.

regards, tom lane

dgrowleyml@gmail.com

about 5 years ago

In reply to: Tom Lane (#4)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On Fri, 14 May 2021 at 11:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I recall somebody (David Rowley, maybe? Too lazy to check archives.)
working on this idea awhile ago, but he didn't get to the point of
a committable patch.

Yeah. Me. The discussion is in [1]/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com.

David

[1]: /messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Tom Lane (#4)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

So now you have to figure out
what the selectivity of that is after the application of the partially
redundant IN clauses.

Would marking the new added RestrictInfo.norm_selec > 1 be OK?

clause_selectivity_ext

/*
* If the clause is marked redundant, always return 1.0.
*/
if (rinfo->norm_selec > 1)
return (Selectivity) 1.0;

--
Best Regards
Andy Fan (https://www.aliyun.com/)

dgrowleyml@gmail.com

about 5 years ago

In reply to: Andy Fan (#6)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On Mon, 17 May 2021 at 14:52, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Would marking the new added RestrictInfo.norm_selec > 1 be OK?

There would be cases you'd want to not count the additional clauses in
the selectivity estimation and there would be cases you would want to.

For example:

SELECT ... FROM t1 INNER JOIN t2 ON t1.dt = t2.dt WHERE t1.dt BETWEEN
'date1' AND 'date2';

If you derived that t2.dt is also BETWEEN 'date1' AND 'date2' then
you'd most likely want to include those quals for scans feeding merge,
hash and non-parameterized nested loop joins, so you'd also want to
count them in your selectivity estimations, else you'd feed junk
values into the join selectivity estimations.

Parameterized nested loop joins might be different as if you were
looping up an index for t1.dt values on some index on t2.dt, then
you'd likely not want to bother also filtering out the between clause
values too. They're redundant in that case.

I imagined we'd have some functions in equivclass.c that allows you to
choose if you wanted the additional filters or not.

Tom's example, WHERE a = b AND a IN (1,2,3), if a and b were in the
same relation then you'd likely never want to include the additional
quals. The only reason I could think that it would be a good idea is
if "b" had an index but "a" didn't. I've not checked the code, but
the index matching code might already allow that to work anyway.

David

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: David Rowley (#7)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On Wed, May 19, 2021 at 8:15 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Mon, 17 May 2021 at 14:52, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Would marking the new added RestrictInfo.norm_selec > 1 be OK?

There would be cases you'd want to not count the additional clauses in
the selectivity estimation and there would be cases you would want to.

For example:

SELECT ... FROM t1 INNER JOIN t2 ON t1.dt = t2.dt WHERE t1.dt BETWEEN
'date1' AND 'date2';

If you derived that t2.dt is also BETWEEN 'date1' AND 'date2' then
you'd most likely want to include those quals for scans feeding merge,
hash and non-parameterized nested loop joins, so you'd also want to
count them in your selectivity estimations, else you'd feed junk
values into the join selectivity estimations.

Yes, you are correct.

Parameterized nested loop joins might be different as if you were
looping up an index for t1.dt values on some index on t2.dt, then
you'd likely not want to bother also filtering out the between clause
values too. They're redundant in that case.

I do not truly understand this.

I imagined we'd have some functions in equivclass.c that allows you to
choose if you wanted the additional filters or not.

Sounds like a good idea.

Tom's example, WHERE a = b AND a IN (1,2,3), if a and b were in the
same relation then you'd likely never want to include the additional
quals. The only reason I could think that it would be a good idea is
if "b" had an index but "a" didn't. I've not checked the code, but
the index matching code might already allow that to work anyway.

+1 for this feature overall.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: David Rowley (#5)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On Fri, May 14, 2021 at 12:22 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 14 May 2021 at 11:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I recall somebody (David Rowley, maybe? Too lazy to check archives.)
working on this idea awhile ago, but he didn't get to the point of
a committable patch.

Yeah. Me. The discussion is in [1].

David

[1]
/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com

Hi:

I read through that thread and summarized the current pending issue as
below IIUC.
a). The most challenging issue is this push down misleads the planner's
rows estimation,
which probably be worse than the lack of such push down. b). The new
generated
qual may increase the qual execution cost. c). Planning time is also
increased but
we can not gain much because of that. I just tried to address these issues
as below
based on the patch David has finished a long time ago.

To address the row estimation issue, The most straightforward way to fix
this is to
ignore the derived clauses when figuring out the RelOptInfo->rows on base
relation.
To note which clause is derived from this patch, I added a new field
"EquivalenceClass *
derived" in RestrictInfo. and then added a included_derived option in
clauselist_selectivity_ext,
during the set_xxx_rel_size function, we can pass the
included_derived=false. This strategy
should be used in get_parameterized_baserel_size. In all the other cases,
include_derived=true
is used. which are finished in commit 2. (Commit 1 is Daivd's patch, I
just rebased it)

set enable_hashjoin to off;
set enable_mergejoin to off;
set enable_seqscan to on;
regression=# explain analyze select * from tenk1 a, tenk1 b where
a.thousand = b.thousand and a.thousand < 100;

QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=27.14..1090.67 rows=10740 width=488) (actual
time=0.404..15.006 rows=10000 loops=1)
-> Bitmap Heap Scan on tenk1 b (cost=26.84..385.26 rows=10000
width=244) (actual time=0.350..1.419 rows=1000 loops=1)
Recheck Cond: (thousand < 100)\
Heap Blocks: exact=324
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..24.34
rows=1074 width=0) (actual time=0.238..0.240 rows=1000 loops=1)
Index Cond: (thousand < 100)
-> Memoize (cost=0.30..0.47 rows=1 width=244) (actual
time=0.002..0.006 rows=10 loops=1000)
Cache Key: b.thousand
Cache Mode: logical
Hits: 900 Misses: 100 Evictions: 0 Overflows: 0 Memory Usage:
277kB
-> Index Scan using tenk1_thous_tenthous on tenk1 a
(cost=0.29..0.46 rows=1 width=244) (actual time=0.010..0.032 rows=10
loops=100)
Index Cond: ((thousand = b.thousand) AND (thousand < 100))
Planning Time: 2.459 ms
Execution Time: 15.964 ms
(14 rows)

As shown above, with commit 2 the JoinRel's rows estimation is correct
now. but it will mislead
the DBA to read the plan. See Bitmap Heap Scan on tenk1 b
(...rows=10000..) (... rows=1000 loops=1)
This is because RelOptInfo->rows is not just used to calculate the
joinrel.rows but also be used to
show the set Path.rows at many places. I can't think of a better way than
adding a new filtered_rows
in RelOptInfo which the semantics is used for Path.rows purpose only. That
is what commit 3 does.

After commit 3, we can see:

regression=# explain analyze select * from tenk1 a, tenk1 b where
a.thousand = b.thousand and a.thousand < 100;

QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=24.90..459.16 rows=10740 width=488) (actual
time=0.440..16.966 rows=10000 loops=1)
-> Bitmap Heap Scan on tenk1 b (cost=24.61..383.03 rows=1074
width=244) (actual time=0.383..1.546 rows=1000 loops=1)
Recheck Cond: (thousand < 100)
Heap Blocks: exact=324
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..24.34
rows=1074 width=0) (actual time=0.270..0.272 rows=1000 loops=1)
Index Cond: (thousand < 100)
-> Memoize (cost=0.30..0.47 rows=1 width=244) (actual
time=0.002..0.008 rows=10 loops=1000)
Cache Key: b.thousand
Cache Mode: logical
Hits: 900 Misses: 100 Evictions: 0 Overflows: 0 Memory Usage:
277kB
-> Index Scan using tenk1_thous_tenthous on tenk1 a
(cost=0.29..0.46 rows=1 width=244) (actual time=0.012..0.050 rows=10
loops=100)
Index Cond: ((thousand = b.thousand) AND (thousand < 100))
Planning Time: 2.578 ms
Execution Time: 17.929 ms
(14 rows)

"Bitmap Heap Scan on tenk1 b (... rows=1074 ..) (.. rows=1000 loops=1)"
shows the issue fixed. but
There is something wrong below.

Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.29..0.46 rows=1
width=244) (actual time=0.012..0.050 rows=10 loops=100)
Index Cond: ((thousand = b.thousand) AND (thousand < 100))

Here the " (thousand < 100)" is from the user, not from this patch. and
(thousand = b.thousand) AND (thousand < 100)
has some correlation. I can't think of a solution for this. and fixing
this issue is beyond the scope of this patch.

So at this stage, I think the row estimation issue is gone.

As the new generated equals increase the execution cost opinion, I think
it is hard for planners to distinguish which quals deserves adding or not.
Instead
I just removed the quals execution during create_plan stage to remove the
obviously
duplicated qual executions. I only handled the case that the derived quals
is executed
at the same time with the restrinctInfo who's parent_ec is used to generate
the
derived quals. If I understand the RestrictInfo.parent_ec correctly, The
cost of
finding out the correlated quals in this patch are pretty low, see
is_correlated_derived_clause.
This is what commit 4 does. After we apply it, we can see the last demo
above becomes to:

regression=# explain analyze select * from tenk1 a join d_tenk2 b on
a.thousand = b.thousand and a.thousand < 100;

QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=10000000000.30..10000002799.78 rows=20020 width=488)
(actual time=0.051..26.080 rows=20000 loops=1)
-> Seq Scan on tenk1 a (cost=10000000000.00..10000000470.00 rows=1001
width=244) (actual time=0.018..3.902 rows=1000 loops=1)
Filter: (thousand < 100)
Rows Removed by Filter: 9000
-> Memoize (cost=0.30..3.18 rows=20 width=244) (actual
time=0.002..0.008 rows=20 loops=1000)
Cache Key: a.thousand
Cache Mode: logical
Hits: 900 Misses: 100 Evictions: 0 Overflows: 0 Memory Usage:
546kB
-> Index Scan using d_tenk2_thousand_idx on d_tenk2 b
(cost=0.29..3.17 rows=20 width=244) (actual time=0.008..0.037 rows=20
loops=100)
Index Cond: (thousand = a.thousand)
Planning Time: 0.596 ms
Execution Time: 27.502 ms
(12 rows)

The "thousand < 100" for b is removed during execution.

Commit 5 reduced the requirements for this path to work. Now it
supports ScalarArrayOpExpr
and any perudoconstant filter to support the user case I meet. Commit 6
added some testcase
and they are just used for review since there are two many runtime
statistics in the output and
I can't think of way to fix it.

I also study David's commit 1, and the semantics of ec_filters is so
accurate and I'm very
excited to see it.

This patch series is still in the PoC stage, so something is not handled at
all. For commit 2, I didn't
handle extended statistics related paths and I just handled plain rel
(subquery, forign table and so
on are missed). I think it is OK for a PoC.

At last, I will share some performance testing for this patch. This is the
real user case I met.

create table p (a int, b int) partition by range(a);
select 'create table p_' || i || ' partition of p for values from (' ||
(i-1) * 100000 || ') to (' || i * 100000 || ');' from generate_series(1,
50)i; \gexec
insert into p select i, i from generate_series(1, 50 * 100000 -1) i;
create index on p(a);

create table q (a int, b int) partition by range(a);
select 'create table q_' || i || ' partition of q for values from (' ||
(i-1) * 100000 || ') to (' || i * 100000 || ');' from generate_series(1,
50)i; \gexec
insert into q select * from p;
create index on q(a);

select * from p, q where p.a = q.a and p.a in (3, 200000);

Run the above query in both prepared and no prepared case, I get the
following results:

| workload | with this feature | w/o this feature |
|--------------+-------------------+------------------|
| Prepared | 0.25 ms | 0.8 ms |
| Non Prepared | 0.890 ms | 4.207 ms |

Any thoughts?

--
Best Regards
Andy Fan

pryzby@telsasoft.com

over 4 years ago

In reply to: Andy Fan (#9)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Subject: [PATCH v1 1/6] Rebaee David's patch against the latest code.

If you use git-am, then the author/commit information is preserved.
It's probably good to include a link to the patch in any case.

Subject: [PATCH v1 4/6] remove duplicated qual executing.

---                                                                                                                                                                                              
 src/backend/optimizer/path/equivclass.c | 22 +++++++++++++++++++                                                                                                                                
 src/backend/optimizer/plan/createplan.c | 29 +++++++++++++++++++++++--                                                                                                                          
 src/include/optimizer/paths.h           |  2 ++                                                                                                                                                 
 src/test/regress/parallel_schedule      |  2 ++                                                                                                                                                 
 4 files changed, 53 insertions(+), 2 deletions(-)

I think the ./ec_filter test is missing from from this patch.

Subject: [PATCH v1 6/6] adding some test cases for this feature and fix the existing case

The tests should be updated with the corresponding patches. It's common for
the patches to be commited separately, like if 0001 is ready but the others are
still evolving.

I'm not sure whether you think this patch is ready to be added to a commitfest,
but do you know about the CI infrastructure ? It allows running all the cfbot
tests for a github branch against 4 OS, which helps catch portability issues,
including memory errors and unstable explain output. See: src/tools/ci/README.
There's an failure in postgres_fdw, probably the output needs to be updated.

--
Justin

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: Justin Pryzby (#10)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Hi Justin:

Thanks for your attention.

On Wed, Feb 2, 2022 at 1:13 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Subject: [PATCH v1 1/6] Rebaee David's patch against the latest code.

If you use git-am, then the author/commit information is preserved.
It's probably good to include a link to the patch in any case.

Thanks for this reminder, I didn't pay enough attention to this. Fixed.

(The original patch looks like a diff file not a commit, I wrote a simple
commit
message for this and link to the origin discussion link.)

Subject: [PATCH v1 4/6] remove duplicated qual executing.

---

src/backend/optimizer/path/equivclass.c | 22 +++++++++++++++++++

src/backend/optimizer/plan/createplan.c | 29 +++++++++++++++++++++++--

src/include/optimizer/paths.h | 2 ++

src/test/regress/parallel_schedule | 2 ++

4 files changed, 53 insertions(+), 2 deletions(-)

I think the ./ec_filter test is missing from from this patch.

Indeed..

Subject: [PATCH v1 6/6] adding some test cases for this feature and fix

the existing case

The tests should be updated with the corresponding patches. It's common
for
the patches to be commited separately, like if 0001 is ready but the
others are
still evolving.

Yes, I agree with this. Just that in this case, the commit split is just
for easy
review/discussion. they are unlikely to be able to commit separately. so I
keep
it as it was and improve each commit message.

I'm not sure whether you think this patch is ready to be added to a
commitfest,
but do you know about the CI infrastructure ? It allows running all the
cfbot
tests for a github branch against 4 OS, which helps catch portability
issues,
including memory errors and unstable explain output. See:
src/tools/ci/README.

Added. https://commitfest.postgresql.org/37/3524/

There's an failure in postgres_fdw, probably the output needs to be
updated.

For the postgres_fdw, I just refreshed the content. with this patch, the
plan changed
from

Foreign Scan
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1,
r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1
INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND
((r2.c1 <= 30)))) ORDER BY r1.c1 ASC NULLS LAST
(4 rows)

to

Sort (cost=108.02..108.04 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Sort Key: ft5.c1
-> Foreign Scan (cost=100.00..107.92 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN
ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S
1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >=
10)) AND ((r2.c1 <= 30)) AND ((r1.c1 >= 10)) AND ((r1.
c1 <= 30))))

But if I set enable_sort = off, we can still get the previous plan, which
proves that
this patch doesn't make the above path unavailable, it is just not cheaper
than
the new one. Here is the new commit messages:

commit e0a7838a09e73f831eecb23b5e7884cc34d71301
Author: David Rowley <dgrowleyml@gmail.com>
Date: Tue Feb 1 20:56:40 2022 +0800

Introudce ec_filters in EquivalenceClass struct, the semantics is the
quals can

be applied to any EquivalenceMember in this EC. Later this information
is used
to generate new RestrictInfo and was distributed to related RelOptInfo
very
soon.

Author: David Rowley at 2015-12 [1]/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com
Andy Fan rebase this patch to current latest code.

/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com

commit 73f52d0909374446cd689457f0a4ef52addb035e
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 14:54:07 2022 +0800

After distributing the new derived RestrictInfo into RelOptInfo, then
the rows
estimation is wrong at the joinrel part. The reason is well described
at [1]/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com and
[2]: /messages/by-id/1727507.1620948117@sss.pgh.pa.us
*derived" in
RestrictInfo struct to indicate how this qual is generated. we would
ignore such
qual during estimate of the rows size. All the set_xx_size should be
taken care of, but
for now, just set_plain_rel_size is taken care of for the PoC purpose.

[1]: /messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com

/messages/by-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com
[2]: /messages/by-id/1727507.1620948117@sss.pgh.pa.us

/messages/by-id/1727507.1620948117@sss.pgh.pa.us

commit 8439b4818410d860a4ca4be3458b54c04c6f8648
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 15:20:10 2022 +0800

Introduce RelOptInfo.filtered_rows.

Previously the Path.rows (shown in the explain output) and
RelOptInfo.rows
which would be used to calculating joinrel's estimated rows are same
at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on.
But
they would be different after distributing a new restrictinfo from
ec_filter.
So I developed RelOptInfo.filtered_rows to take some duty out of
RelOptInfo.rows.

commit 11b3395bb5bcc4a2bcff6fed8078dbbf3cda81b1
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 17:37:27 2022 +0800

Remove duplicated qual executing for executor.

Take the SELECT * FROM t1, t2 WHERE t1.a = t2.a and t2.a > 3 for
example,
we can derive t1.a > 3 with EC filter infrastructure. However if it
generate a
plan like below, the new generated qual does not deserve to execute.

Nest Loop
Seq Scan (t1.a > 3)
Index Scan t2_a
(a = t1.a) (t2.a > 3)

This patch removes the "t2.a > 3" for the above case.

commit 2875a76136293589b6e409cb6be4defab87ade59
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:54:24 2022 +0800

Support ScalarArrayOpExpr and perudoconstant on ef_filter.

commit a4b21ab6fd0fd57902f5471ec962a77b59085158 (HEAD -> cf_v4)
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:59:53 2022 +0800

Added the testcase for this feature and fix the previous test case

as well. The new added test case needs outputting some runtime
statistics, which will probably be different at each run. I can think
of a way to make the test case stable if the patchsets are not wrong
at the first step.

--
Best Regards
Andy Fan

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Andy Fan (#11)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Hi,

there's been an interesting case [1]/messages/by-id/CA+1Wm9U_sP9237f7OH7O=-UTab71DWOO4Qc-vnC78DfsJQBCwQ@mail.gmail.com of a slow query on pgsql-general,
related to the topic discussed in this thread. It causes an order the
query to run slower by multiple orders of magnitude, and I think it's
interesting, so let me present a simple example demonstrating it.

------------------------------------------------------------------------
create table t1 (a int);
create table t2 (a int);

insert into t1
select i from generate_series(1,100000) s(i);

insert into t2
select mod(i,100000) from generate_series(1,10000000) s(i);

create index on t1(a);
create index on t2(a);

vacuum analyze t1, t2;

-- we need to force mergejoin
set enable_nestloop = off;
------------------------------------------------------------------------

Now, let's run a simple query:

SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
WHERE (t1.a > 99000) ORDER BY t1.a LIMIT 100;

QUERY PLAN
------------------------------------------------------------------------
Limit (cost=4.63..224.57 rows=100 width=8)
(actual time=8999.487..8999.707 rows=100 loops=1)
-> Merge Join (cost=4.63..209447.97 rows=95226 width=8)
(actual time=8999.485..8999.620 rows=100 loops=1)
Merge Cond: (t1.a = t2.a)
-> Index Only Scan using t1_a_idx on t1
(cost=0.29..29.25 rows=969 width=4)
(actual time=0.010..0.011 rows=1 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
-> Index Only Scan using t2_a_idx on t2
(cost=0.43..183464.09 rows=9999977 width=4)
(actual time=0.026..4594.757 rows=9900200 loops=1)
Heap Fetches: 0
Planning Time: 0.338 ms
Execution Time: 8999.768 ms
(10 rows)

Now, let's do a simple trick and add condition on t2.a, "implied" by the
join condition (t1.a = t2.a) and inequality (t1.a > 99000).

SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
WHERE (t1.a > 99000) AND (t2.a > 99000) ORDER BY t1.a LIMIT 100;

QUERY PLAN
------------------------------------------------------------------------
Limit (cost=0.77..250.39 rows=100 width=8)
(actual time=0.040..0.294 rows=100 loops=1)
-> Merge Join (cost=0.77..2297.23 rows=920 width=8)
(actual time=0.039..0.172 rows=100 loops=1)
Merge Cond: (t1.a = t2.a)
-> Index Only Scan using t1_a_idx on t1
(cost=0.29..29.25 rows=969 width=4)
(actual time=0.031..0.031 rows=1 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
-> Index Only Scan using t2_a_idx on t2
(cost=0.43..2014.87 rows=96596 width=4)
(actual time=0.005..0.052 rows=100 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
Planning Time: 0.222 ms
Execution Time: 0.414 ms
(11 rows)

Well, that's quite a difference! From 9000ms to 1ms, pretty good.

What is happening in the first plan is the merge join needs t2 sorted by
t2.a, and the index-only-scan looks like a great way to do that, as it
has low startup cost (because LIMIT likes that). But this completely
misses that (t1.a > 9900) implies (t2.a > 9900) through the equality in
join condition. So we start scanning t2_a_idx, only to throw the first
99% of tuples away.

In the original report this is particularly egregious, because the index
only scan looks like this:

-> Index Only Scan using data_class_pkey on data_class ta
(cost=0.57..4935483.78 rows=216964862 width=8)
(actual time=0.018..35022.908 rows=151321889 loops=1)
Heap Fetches: 151321889

So yeah, 151M heap fetches, that's bound to be expensive :-/

Adding the condition on t2.a allows just skipping the first chunk of the
index, eliminating the expensive part.

Of course, this breaks the estimates in the faster query, because we now
apply the condition twice - once for the index scan, one as the join
clause. So instead of ~100k rows the join is estimated as ~1000 rows.

I'm also not claiming this is 100% worth it - queries with a suitable
combination of clauses (conditions on the join keys) seems rather
uncommon. But it seems like an interesting example, because it may be
seen either as missed execution optimization (failing to skip the
initial chunk of rows), or an costing issue due to not accounting for
having to process the rows (which would likely result in picking a
different plan).

regards

[1]: /messages/by-id/CA+1Wm9U_sP9237f7OH7O=-UTab71DWOO4Qc-vnC78DfsJQBCwQ@mail.gmail.com
/messages/by-id/CA+1Wm9U_sP9237f7OH7O=-UTab71DWOO4Qc-vnC78DfsJQBCwQ@mail.gmail.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: Tomas Vondra (#12)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Hi,

On Sat, Feb 5, 2022 at 9:32 PM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

I'm also not claiming this is 100% worth it - queries with a suitable
combination of clauses (conditions on the join keys) seems rather
uncommon.

Thanks for showing interest in this. I want to add some other user cases
which seem not very uncommon. a). When we join the key on a foregin
table, in which case, push down a qual to foregin key would be pretty
good to reduce the data transformed from the network. b). If the people
join many partitioned table on partitioned key, but they want to query
more than 1 partitions (which means the qual on partition key is not a
simple "partitionKey = Const"), then we have to do a run-time partition
prune (lose the chance for initial partition prune). We have big difference
on the performance aspect as well.

I guess some of the people who think we may need this feature are not very
clear about what bad it would be if we add this feature (Of course Including
me). I summarized the discussion before and hacked the solution at [1]/messages/by-id/CAKU4AWpo9z0hMHDWUKuce4Z-NpcybV0J2UVu5+DVwyP-CrHCQg@mail.gmail.com,
the
current state looks reasonable to me. I'm not sure if I missed any point.

Of course, this breaks the estimates in the faster query, because we now
apply the condition twice - once for the index scan, one as the join
clause. So instead of ~100k rows the join is estimated as ~1000 rows.

I think my patch has addressed this. Here is the example:

postgres=# set geqo to off; -- disable this feature, we have an estimation
error.
-- using geqo guc in patch is
just for easy testing.
SET
postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
WHERE (t1.a > 99000) and t2.a > 99000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.73..2408.37 rows=990 width=8)
(actual time=0.032..21.350 rows=99900 loops=1)
Merge Cond: (t1.a = t2.a)
-> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.64 rows=991
width=4)
(actual time=0.014..0.121
rows=1000 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
-> Index Only Scan using t2_a_idx on t2 (cost=0.43..2113.20
rows=101301 width=4)
(actual time=0.013..9.854
rows=99900 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
Planning Time: 0.282 ms
Execution Time: 24.823 ms
(10 rows)

postgres=# set geqo to on; -- enable this feature and let planner derive
the qual by itself, the estimation
-- is good.
SET
postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
WHERE (t1.a > 99000) ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.73..2408.37 rows=97680 width=8)
(actual time=0.031..21.296 rows=99900 loops=1)
Merge Cond: (t1.a = t2.a)
-> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.64 rows=991
width=4)
(actual time=0.014..0.116
rows=1000 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
-> Index Only Scan using t2_a_idx on t2 (cost=0.43..2113.20
rows=101301 width=4)
(actual time=0.012..9.751
rows=99900 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
Planning Time: 0.269 ms
Execution Time: 24.749 ms
(10 rows)

So I think knowing what bad it is to have this feature is the key point to
discussion now.

[1]: /messages/by-id/CAKU4AWpo9z0hMHDWUKuce4Z-NpcybV0J2UVu5+DVwyP-CrHCQg@mail.gmail.com
/messages/by-id/CAKU4AWpo9z0hMHDWUKuce4Z-NpcybV0J2UVu5+DVwyP-CrHCQg@mail.gmail.com

--
Best Regards
Andy Fan

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: Andy Fan (#13)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

Of course, this breaks the estimates in the faster query, because we now
apply the condition twice - once for the index scan, one as the join
clause. So instead of ~100k rows the join is estimated as ~1000 rows.

I think my patch has addressed this. Here is the example: ...

So I think knowing what bad it is to have this feature is the key point to

discussion now.

[1]
/messages/by-id/CAKU4AWpo9z0hMHDWUKuce4Z-NpcybV0J2UVu5+DVwyP-CrHCQg@mail.gmail.com

I forgot to upload these patches, upload them now.

--
Best Regards
Andy Fan

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: Andy Fan (#14)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

So I think knowing what bad it is to have this feature is the key point to

discussion now.

I re-read the discussion at 2015 [1] and the below topic is added for the
above
question. Here is the summary for easy discussion.

====
From planner aspect:

While I've only read your description of the patch not the patch itself,
the search methodology you propose seems pretty brute-force and
unlikely to solve that issue. It's particularly important to avoid

O(N^2)

behaviors when there are N expressions ...

The patch has 3 steps in general. 1). Gather the filter_qual_list during
the deconstruct_jointree. only unmergeable qual is gathered here.
2). After the root->eq_classes is built, scan each of the above quals to
find out if there is a EC match, if yes, add it to the EC. There are
some fast paths here. like ec->relids, em->em_relids. 3). compose
the qual in ec_filter and members in ec_members, then distribute it to
the relations. This step take the most cycles of this feature, and it is
the most important part for this feature as well.

Fortunately, thousands of partitions of a table would not make it worse
since they are not generated at that stage. So I'd believe the number of
ECs or EMs in an EC would be pretty small in common cases.

time would be spent on searches for matching subexpressions whether
or not anything was learned (and often nothing would be learned).

This is about some cases like "SELECT * FROM t1, t2 WHERE t1.a = t2.a
and t1.b > 3". In this case, we still need to go through steps 1 & 2,
all the fast
paths don't work and the equal() is unavoidable. However step 3 can be
ignored.
If we want to improve this, could we maintain an attr_eq_indexes in
RelOptInfos
which indicates if the given attribute appears in any one of EC members?

=====
From executor aspects:

The reason why the answer isn't automatically "all of them"
is because, first of all, it's possible that enforcing the condition
at a particular table costs more to filter out the rows that we save
in execution time at higher levels of the plan tree. For example,
consider A JOIN B ON A.X = B.X WHERE A.X > 1000000. It might be that
the range of A.X is [0,1000001] but the range of B.X is
[1000000,2000000]; so enforcing the inequality against A is very
selective but enforcing it against B filters out basically nothing.

I think we can classify this as we push down / execute an qual, the
qual takes lots of cycles, but it doesn't filter many rows.

A first cut might be to enforce the inequality against the relation
where it's believed to be most selective, equivalence-class column
mentioned in the inequality provided that the
selectivity is thought to be above some threshold ... but I'm not sure
this is very principled,

I can only input +1 after some deep thoughts.

Furthermore, there are some cases involving parameterized paths where
enforcing the inequality multiple times is definitely bad: for
example, if we've got a nested loop where the outer side is a seq scan
that enforces the condition and the inner side is an index probe, it
is just a waste to retest it on the inner side. We already know that
the outer row passes the inequality, so the inner row will necessarily
pass also. This doesn't apply to merge or hash joins, and it also
doesn't apply to all nested loops: scans that aren't paramaterized by
the equivalence-class column can still benefit from separate
enforcement of the inequality.

I guess that could be fixed by somehow marking these pushed quals as
optional and having parameterised scans ignore optional quals.

This has been done by committing 4.

Now, all that having been said, I think this is a pretty important
optimization. Lots of people have asked for it, and I think it would
be worth expending some brainpower to try to figure out a way to be
smarter than we are now, which is, in a nutshell, as dumb as possible.

+1. I asked custom to add the derivable quals manually for 10+ of table
each query last year and gained great results.

Anyone still have interest in this? Or is a better solution really
possible?
Or is the current method too bad to rescue?

--
Best Regards
Andy Fan

jfinnert@amazon.com

over 4 years ago

In reply to: Andy Fan (#15)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

So I think knowing what bad it is to have this feature is the key point to discussion now.

While I've only read your description of the patch not the patch itself,

This comment applies to me also.

Is the join selectivity properly calculated in all cases, e.g. in the n:m join case in particular, or in general when you’re not joining to a unique key? (this would be the usual situation here, since it adds a range qual to a join qual)

Furthermore, there are some cases involving parameterized paths where
enforcing the inequality multiple times is definitely bad

* This has been done by committing 4.

What remaining cases are there where the qual is evaluated redundantly?

* Anyone still have interest in this? Or is a better solution really possible?
Or is the current method too bad to rescue?

As you’ve shown, this can potentially be very important, though I don’t think you’ll often see equijoins with an additional range restriction on the join keys. When it happens, though, it could be especially important for joins to partitioned tables with many remote fdw partitions when the join can’t be pushed down to the remote server.

robertmhaas@gmail.com

over 4 years ago

In reply to: Andy Fan (#9)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On Tue, Feb 1, 2022 at 10:08 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

To address the row estimation issue, The most straightforward way to fix this is to
ignore the derived clauses when figuring out the RelOptInfo->rows on base relation.
To note which clause is derived from this patch, I added a new field "EquivalenceClass *
derived" in RestrictInfo. and then added a included_derived option in clauselist_selectivity_ext,
during the set_xxx_rel_size function, we can pass the included_derived=false. This strategy
should be used in get_parameterized_baserel_size. In all the other cases, include_derived=true
is used. which are finished in commit 2. (Commit 1 is Daivd's patch, I just rebased it)

That doesn't sound correct to me.

Suppose that we have A.x = B.x and also A.x < 42. We can choose to
enforce A.x < 42 or we can choose to enforce B.x < 42 or we can do
both. In general, any of those could be right: if either one of those
two is highly selective while the other is not very selective at all,
it's going to be fastest to enforce only the more selective qual. But
if both are selective then it may be best to enforce both, so let's
suppose we do that. If we don't adopt the proposal above and just do
nothing, then our row count estimates for both A and B will include
the effect of checking x < 42, and so they will be correct, but the
row count estimate for join(A, B) will include the effect of checking
x < 42 twice, and so it will be too low, which can mess up the plan at
higher levels.

But discounting the effect of B.x < 42 when estimating the size of B
is also incorrect. Now, the row count estimate for join(A, B) will
include the effect of x < 42 only once, which is good. However, the
row count estimate for B will be too high, because it will not include
the effect of B.x < 42. And that means that the cost estimate for
join(A, B) will be wrong. It will be too high, because it's going to
think that it has more rows coming from the B side of the join than
what is actually the case. And that can also mess up the plan at
higher levels.

I think we could get around this problem by having multiple
RelOptInfos (or something similar that is lighter-weight) for each
relation. Today, we'd create a RelOptInfo for A, one for B, and one
for join(A, B), and the paths for the join are created by joining a
path for A to a path for B. Now imagine that we have instead 5
RelOptInfos, for {A}, {A|x<42}, {B}, {B|x<42}, and join(A, B). The
legal paths for that last one can be created by joining {A} to
{B|x<42} or {A|x<42} to {B} or {A|x<42} to {B|x<42}. Each of those 5
RelOptInfos can have its own cardinality estimate, and it seems pretty
straightforward to see how to get both the scan cardinality and the
join cardinality correct. Now I think this is decidedly non-trivial to
implement, and I also hear the voice of Tom Lane saying that's going
to be expensive in both time and memory, and he's not wrong.

On the other hand, I completely agree with David's comments on the
other thread to the effect that holding our breath is not getting us
anywhere. People don't keep asking for this feature because it's a
stupid thing that nobody really wants, and when Tom alleges that it
will rarely pay off, I think he's pretty far off the mark. The only
time we need to consider doing any extra work is when we have
something like the example discussed here, namely A.x = B.x and A.x <
42. If there is a variable that is part of an equivalence class and
also is used in a scan qual, what are the chances that the implied
inequality is useful? There's no way to estimate that mathematically -
it's all about what you think human beings are typically going to do -
but I'd say it's probably better than 50%. I know that when I was
regularly doing application programming on top of PostgreSQL I was
VERY aware of this limitation of the optimizer and habitually thought
about which table to write the inequality against. That kept me out of
trouble most of the time, but it sure seems like we're punting the
optimizer's job to the end user.

And even then, I still sometimes couldn't stay out of trouble, because
sometimes I knew that the implied inequality really ought to be
enforced against both sides of the join to get a decent plan. In that
case, the only way to get the optimizer to do what I wanted was to
duplicate the qual. But that runs headlong into the exact problem that
we're talking about here: now the join selectivity is going to be
messed up, and then some other part of the plan would get messed up. I
still remember the frustration associated with that scenario more than
10 years later. You can't even fix it by uglifying your query with a
planner hint, because we don't support those either. Which brings me
to another point: it's incoherent to simultaneously argue that we
shouldn't have planner hints but rather focus on improving the
planner, and at the same time refuse to improve the planner because it
would make planning too expensive. I actually think we should do both,
because I neither believe that it's impossible to fix this particular
problem nor that it is possible to create a planner so good that it
always makes the right decisions without any explicit input from a
human being. But the only way you can think this problem is unfixable
and at the same time think we don't need hints is if you think this
problem is fake.

It's not.

--
Robert Haas
EDB: http://www.enterprisedb.com

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Robert Haas (#17)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On 2/17/22 21:15, Robert Haas wrote:

On Tue, Feb 1, 2022 at 10:08 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

To address the row estimation issue, The most straightforward way to fix this is to
ignore the derived clauses when figuring out the RelOptInfo->rows on base relation.
To note which clause is derived from this patch, I added a new field "EquivalenceClass *
derived" in RestrictInfo. and then added a included_derived option in clauselist_selectivity_ext,
during the set_xxx_rel_size function, we can pass the included_derived=false. This strategy
should be used in get_parameterized_baserel_size. In all the other cases, include_derived=true
is used. which are finished in commit 2. (Commit 1 is Daivd's patch, I just rebased it)

That doesn't sound correct to me.

Suppose that we have A.x = B.x and also A.x < 42. We can choose to
enforce A.x < 42 or we can choose to enforce B.x < 42 or we can do
both. In general, any of those could be right: if either one of those
two is highly selective while the other is not very selective at all,
it's going to be fastest to enforce only the more selective qual. But
if both are selective then it may be best to enforce both, so let's
suppose we do that. If we don't adopt the proposal above and just do
nothing, then our row count estimates for both A and B will include
the effect of checking x < 42, and so they will be correct, but the
row count estimate for join(A, B) will include the effect of checking
x < 42 twice, and so it will be too low, which can mess up the plan at
higher levels.

But discounting the effect of B.x < 42 when estimating the size of B
is also incorrect. Now, the row count estimate for join(A, B) will
include the effect of x < 42 only once, which is good. However, the
row count estimate for B will be too high, because it will not include
the effect of B.x < 42. And that means that the cost estimate for
join(A, B) will be wrong. It will be too high, because it's going to
think that it has more rows coming from the B side of the join than
what is actually the case. And that can also mess up the plan at
higher levels.

I think we could get around this problem by having multiple
RelOptInfos (or something similar that is lighter-weight) for each
relation. Today, we'd create a RelOptInfo for A, one for B, and one
for join(A, B), and the paths for the join are created by joining a
path for A to a path for B. Now imagine that we have instead 5
RelOptInfos, for {A}, {A|x<42}, {B}, {B|x<42}, and join(A, B). The
legal paths for that last one can be created by joining {A} to
{B|x<42} or {A|x<42} to {B} or {A|x<42} to {B|x<42}. Each of those 5
RelOptInfos can have its own cardinality estimate, and it seems pretty
straightforward to see how to get both the scan cardinality and the
join cardinality correct. Now I think this is decidedly non-trivial to
implement, and I also hear the voice of Tom Lane saying that's going
to be expensive in both time and memory, and he's not wrong.

IMHO the whole problem is we're unable to estimate the join clause as a
conditional probability, i.e.

P(A.x = B.x | (A.x < 42) & (B.x < 42))

so maybe instead of trying to generate additional RelOptInfo items we
should think about improving that. The extra RelOptInfos don't really
solve this, because even if you decide to join A|x<42 to B|x<42 it does
nothing to improve the join clause estimate.

With equality clauses we don't have this issue, because if you derive
clauses at the baserel level, the join clause becomes no-op with
selecitivity 1.0. But for inequalities that does not work ...

Interestingly enough, the patch [1]https://commitfest.postgresql.org/36/3055/ tries to do something like this by
applying extended statistics to joins, and using baserestrictinfos as
"conditions" for statistics on both sides.

It actually deals with a more general form of this case, because the
clauses don't need to reference the same attribute - so for example this
would work too, assuming there is extended stats object on the columns
on each side:

P(A.c = B.d | (A.e < 42) & (B.f < 42))

[1]: https://commitfest.postgresql.org/36/3055/

On the other hand, I completely agree with David's comments on the
other thread to the effect that holding our breath is not getting us
anywhere. People don't keep asking for this feature because it's a
stupid thing that nobody really wants, and when Tom alleges that it
will rarely pay off, I think he's pretty far off the mark. The only
time we need to consider doing any extra work is when we have
something like the example discussed here, namely A.x = B.x and A.x <
42. If there is a variable that is part of an equivalence class and
also is used in a scan qual, what are the chances that the implied
inequality is useful? There's no way to estimate that mathematically -
it's all about what you think human beings are typically going to do -
but I'd say it's probably better than 50%. I know that when I was
regularly doing application programming on top of PostgreSQL I was
VERY aware of this limitation of the optimizer and habitually thought
about which table to write the inequality against. That kept me out of
trouble most of the time, but it sure seems like we're punting the
optimizer's job to the end user.

Not sure. In my experience queries with both a join clause and other
clauses referencing the same attribute are pretty rare. But I agree if
we can do the expensive stuff only when actually needed, with no cost in
the 99.999% other cases, I don't see why not. Of course, code complexity
is a cost too.

And even then, I still sometimes couldn't stay out of trouble, because
sometimes I knew that the implied inequality really ought to be
enforced against both sides of the join to get a decent plan. In that
case, the only way to get the optimizer to do what I wanted was to
duplicate the qual. But that runs headlong into the exact problem that
we're talking about here: now the join selectivity is going to be
messed up, and then some other part of the plan would get messed up. I
still remember the frustration associated with that scenario more than
10 years later. You can't even fix it by uglifying your query with a
planner hint, because we don't support those either. Which brings me
to another point: it's incoherent to simultaneously argue that we
shouldn't have planner hints but rather focus on improving the
planner, and at the same time refuse to improve the planner because it
would make planning too expensive. I actually think we should do both,
because I neither believe that it's impossible to fix this particular
problem nor that it is possible to create a planner so good that it
always makes the right decisions without any explicit input from a
human being. But the only way you can think this problem is unfixable
and at the same time think we don't need hints is if you think this
problem is fake.

IMHO to deal with the estimates it'd be enough to allow calculating
conditional probabilities.

No comment regarding hints ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

robertmhaas@gmail.com

over 4 years ago

In reply to: Tomas Vondra (#18)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On Thu, Feb 17, 2022 at 4:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

IMHO the whole problem is we're unable to estimate the join clause as a
conditional probability, i.e.

P(A.x = B.x | (A.x < 42) & (B.x < 42))

so maybe instead of trying to generate additional RelOptInfo items we
should think about improving that. The extra RelOptInfos don't really
solve this, because even if you decide to join A|x<42 to B|x<42 it does
nothing to improve the join clause estimate.

I guess I hadn't considered that angle. I think the extra RelOptInfos
(or whatever) actually do solve a problem, because enforcing a
high-selectivity join qual against both sides is potentially quite
wasteful, and you need some way to decide whether to do it on one
side, the other, or both. But it's also true that I was wrong to
assume independence ... and if we could avoid assuming that, then the
join selectivity would work itself out without any of the machinery
that I just proposed.

It actually deals with a more general form of this case, because the
clauses don't need to reference the same attribute - so for example this
would work too, assuming there is extended stats object on the columns
on each side:

P(A.c = B.d | (A.e < 42) & (B.f < 42))

That'd be cool.

Not sure. In my experience queries with both a join clause and other
clauses referencing the same attribute are pretty rare. But I agree if
we can do the expensive stuff only when actually needed, with no cost in
the 99.999% other cases, I don't see why not. Of course, code complexity
is a cost too.

Right. I mean, we could have a planner GUC to control whether the
optimization is used even in cases where we see that it's possible.
But Tom keeps arguing that it is possible in many queries and would
benefit few queries, and I'm not seeing why that should be so. I think
it's likely to benefit many of the queries to which it applies.

--
Robert Haas
EDB: http://www.enterprisedb.com

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: Robert Haas (#17)

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

On Fri, Feb 18, 2022 at 4:15 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 1, 2022 at 10:08 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

To address the row estimation issue, The most straightforward way to fix

this is to

ignore the derived clauses when figuring out the RelOptInfo->rows on

base relation.

To note which clause is derived from this patch, I added a new field

"EquivalenceClass *

derived" in RestrictInfo. and then added a included_derived option in

clauselist_selectivity_ext,

during the set_xxx_rel_size function, we can pass the

included_derived=false. This strategy

should be used in get_parameterized_baserel_size. In all the other

cases, include_derived=true

is used. which are finished in commit 2. (Commit 1 is Daivd's patch, I

just rebased it)

That doesn't sound correct to me.

Suppose that we have A.x = B.x and also A.x < 42. We can choose to
enforce A.x < 42 or we can choose to enforce B.x < 42 or we can do
both. In general, any of those could be right:

This is impressive. To achieve this, we have to treat a.x < 42 and
b.x < 42 equally rather than b.x < 42 is derived from a.x < 42,
and enforce the final plan to execute 1 qual in such a group at least.
This introduces some more complexity at the first glance, but I think
it is a great aspect to think about.

.., which is good. However, the
row count estimate for B will be too high, because it will not include
the effect of B.x < 42. And that means that the cost estimate for
join(A, B) will be wrong. It will be too high, because it's going to
think that it has more rows coming from the B side of the join than
what is actually the case. And that can also mess up the plan at
higher levels.

IIUC, this would not happen if we apply the commit 3.

In commit 3, the real rows for the scan path are adjusted by
RelOptInfo->filter_rows,
which take the effect of B.x < 42. It is used in cost_{ascan}_path lately.
Here is an example:

regression=# explain analyze select * from tenk1 a, tenk1 b where
a.thousand = b.thousand and a.thousand < 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=24.90..459.16 rows=10740 width=488) (actual
time=0.416..17.459 rows=10000 loops=1)
-> Bitmap Heap Scan on tenk1 b (cost=24.61..383.03 rows=1074 width=244)
(actual time=0.369..1.801 rows=1000 loops=1)
Recheck Cond: (thousand < 100)
Heap Blocks: exact=324
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..24.34 rows=1074
width=0) (actual time=0.251..0.251 rows=1000 loops=1)
Index Cond: (thousand < 100)
-> Memoize (cost=0.30..0.47 rows=1 width=244) (actual time=0.002..0.006
rows=10 loops=1000)
Cache Key: b.thousand
Cache Mode: logical
Hits: 900 Misses: 100 Evictions: 0 Overflows: 0 Memory Usage: 277kB
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.29..0.46
rows=1 width=244) (actual time=0.012..0.033 rows=10 loops=100)
Index Cond: ((thousand = b.thousand) AND (thousand < 100))
Planning Time: 0.934 ms
Execution Time: 18.496 ms
(14 rows)

b.thousand < 100 is derived from a.thousand < 100; and the final path cost
is:
-> Bitmap Heap Scan on tenk1 b (cost=24.61..383.03 rows=1074 width=244)
(actual time=0.369..1.801 rows=1000 loops=1)
Recheck Cond: (thousand < 100)
Heap Blocks: exact=324
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..24.34 rows=1074
width=0) (actual time=0.251..0.251 rows=1000 loops=1)
Index Cond: (thousand < 100)

Which is exactly same as select * from tenk1 where thousand < 100;

== Commit 3 [1]/messages/by-id/CAKU4AWrdeQZ8xvf=DVhndUs=RGn8oVoSJvYK3Yj7uWq2=dt=Mg@mail.gmail.com message with some modification ==

Introduce RelOptInfo.filtered_rows.

Previously the Path.rows (shown in the explain output) and
RelOptInfo.rows
(which would be used to calculating joinrel's estimated rows) are same
at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on.
But
they would be different after distributing a new restrictinfo from
ec_filter.
So I developed RelOptInfo.filtered_rows to take some duty out of
RelOptInfo.rows.
RelOptInfo.filtered_rows would count the effect of derived qual, and be
used for
cost_xxx function.

On the other hand, I completely agree with David's comments on the
other thread to the effect that holding our breath is not getting us
anywhere.

+1 with this as well. PostgreSQL community has an great reputation
in the world, and mostly because the authority figures in this community
has set up a good example for the following people. and it is not
easy. But if the authority figures are too restrict with code quality,
this
is not good for the community as well, we should encourage more people
to have a try, to some extent.

Taking the current feature for example, the estimation issue absolutely
needs a fix. While I do listen/think carefully how to reduce planner
extra cost or rethink if any important items are missed by me. I also think
David's method is not unacceptable at all.

What do you think about moving on this feature? The items known by me
are: 1). Make sure the estimation error can be fixed or discuss if my
current
solution is workable. b). Just distribute some selectivity restrictinfo
to
RelOptInfo. c). See how hard it is to treat the original / derived qual
equally.
d). Reduce the extra planner cost at much as possible. Any other important
items I missed?

[1]: /messages/by-id/CAKU4AWrdeQZ8xvf=DVhndUs=RGn8oVoSJvYK3Yj7uWq2=dt=Mg@mail.gmail.com
/messages/by-id/CAKU4AWrdeQZ8xvf=DVhndUs=RGn8oVoSJvYK3Yj7uWq2=dt=Mg@mail.gmail.com

--
Best Regards
Andy Fan

robertmhaas@gmail.com

over 4 years ago

In reply to: Andy Fan (#20)

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Robert Haas (#19)

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: Robert Haas (#21)

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: Tomas Vondra (#22)

robertmhaas@gmail.com

about 4 years ago

In reply to: Andy Fan (#23)

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Robert Haas (#25)

robertmhaas@gmail.com

about 4 years ago

In reply to: Tom Lane (#26)

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Robert Haas (#27)

zhihui.fan1213@gmail.com

about 4 years ago

In reply to: Tom Lane (#26)

zhihui.fan1213@gmail.com

about 4 years ago

In reply to: Robert Haas (#27)

robertmhaas@gmail.com

about 4 years ago

In reply to: Tom Lane (#28)

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Robert Haas (#31)

robertmhaas@gmail.com

about 4 years ago

In reply to: Tom Lane (#32)

zhihui.fan1213@gmail.com

about 4 years ago

In reply to: Robert Haas (#33)

zhihui.fan1213@gmail.com

about 4 years ago

In reply to: Andy Fan (#34)

andres@anarazel.de

about 4 years ago

In reply to: Andy Fan (#34)

zhihui.fan1213@gmail.com

about 4 years ago

In reply to: Andres Freund (#36)

thomas.munro@gmail.com

about 4 years ago

In reply to: Andy Fan (#37)

zhihui.fan1213@gmail.com

about 4 years ago

In reply to: Thomas Munro (#38)

Andrei Lepikhov

lepihov@gmail.com

almost 4 years ago

In reply to: Andy Fan (#39)

Ian Lawrence Barwick

barwick@gmail.com

over 3 years ago

In reply to: Andrei Lepikhov (#40)

zhihui.fan1213@gmail.com

over 3 years ago

In reply to: Ian Lawrence Barwick (#41)