Discussion on missing optimizations

Started by Adam Brusselbackover 8 years ago33 messageshackers

adambrusselback@gmail.com

over 8 years ago

Hopefully it's alright for me to post this here, please let me know if not.

I ran across an article on blog.jooq.org comparing all the major RDBMS'
with their ability to optimize away unnecessary work with queries which are
less than optimal, and saw some discussion on hackernews and reddit, but I
hadn't seen any discussion here.

The article in question is here:
https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/

Commercial databases seem to have a serious leg up in this area, and there
are even some that MySQL has that Postgres doesn't.

I was wondering which of these are planned, which have had discussion
before and decided not to support, and which just haven't been thought of?

I thought i'd bring it up and hopefully others who are more knowledgeable
can chime in.

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Adam Brusselback (#1)

Re: Discussion on missing optimizations

Hi,

On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:

Hopefully it's alright for me to post this here, please let me know if not.

I ran across an article on blog.jooq.org comparing all the major RDBMS'
with their ability to optimize away unnecessary work with queries which are
less than optimal, and saw some discussion on hackernews and reddit, but I
hadn't seen any discussion here.

The article in question is here:
https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/

That's interesting.

Commercial databases seem to have a serious leg up in this area, and there
are even some that MySQL has that Postgres doesn't.

I was wondering which of these are planned, which have had discussion
before and decided not to support, and which just haven't been thought of?

Hm, going through the ones we don't [fully] support:

3. JOIN Elimination

There's been a lot of discussion and several patches. There's a bunch of
problems here, one being that there's cases (during trigger firing,
before the constraint checks) where foreign keys don't hold true, so we
can't quite generally make these optimization. Another concern is
whether the added plan time is actually worthwhile.

4. Removing “Silly” Predicates

(i.e stuff like column = column)

This has deemed not to be a valuable use of plan time. I'm doubtful
about that argument, but that IIRC was the concensus the last time this
came up.

6. Predicate Merging

(i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar
with OR)

I can't remember proposals about adding this, but it seems worthwhile to
consider. I think we should be able to check for this without a lot of
planner overhead.

7. Provably Empty Sets
8. CHECK Constraints

I think some of this should actually work with constraint exclusion
turned on.

9. Unneeded Self JOIN

Can't remember discussions of this.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#2)

Re: Discussion on missing optimizations

Andres Freund <andres@anarazel.de> writes:

On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:

The article in question is here:
https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/

That's interesting.

The impression I have in a quick scan is that probably hardly any of these
are cases that any of the DB designers think are important in themselves.
Rather, they fall out of more general optimization attempts, or not,
depending on the optimization mechanisms in use in a particular DB.
For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
comes out of a constant-subexpression-precalculation mechanism for us,
whereas "WHERE column=column" doesn't fall to that approach. ISTM it
would be really dumb to expend planner cycles looking specifically for
that case, so I guess that DB2 et al are finding it as a side-effect of
some more general optimization ... I wonder what that is?

(edit: a few minutes later, I seem to remember that equivclass.c has
to do something special with the X=X case, so maybe it could do
something else special instead, with little new overhead.)

(i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar
with OR)

I can't remember proposals about adding this, but it seems worthwhile to
consider. I think we should be able to check for this without a lot of
planner overhead.

It would be enormously expensive to check that in the general case with
a bunch of entries in each IN list. Perhaps it would be OK to add on
the presumption that few queries would contain multiple IN's on the same
column in the first place, though. Not sure.

9. Unneeded Self JOIN

Can't remember discussions of this.

I can't get very excited about that one either.

In the end, what the article fails to consider is that all of these are
tradeoffs, not unalloyed goods. If you spend planner cycles on every
query to look for cases that only the most unabashedly brain-dead ORMs
ever generate, you're not really doing your users a favor on balance.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#3)

Re: Discussion on missing optimizations

On 2017-10-06 22:19:54 -0400, Tom Lane wrote:

The impression I have in a quick scan is that probably hardly any of these
are cases that any of the DB designers think are important in
themselves.

In the end, what the article fails to consider is that all of these are
tradeoffs, not unalloyed goods. If you spend planner cycles on every
query to look for cases that only the most unabashedly brain-dead ORMs
ever generate, you're not really doing your users a favor on balance.

Partially agreed. A comment to the article also mentions that some other
database performs more optimizations depending on the cost of the
plan. That's not easy to do in our current plan structure, but I think
it's quite a worthwhile concept.

Rather, they fall out of more general optimization attempts, or not,
depending on the optimization mechanisms in use in a particular DB.
For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
comes out of a constant-subexpression-precalculation mechanism for us,
whereas "WHERE column=column" doesn't fall to that approach. ISTM it
would be really dumb to expend planner cycles looking specifically for
that case, so I guess that DB2 et al are finding it as a side-effect of
some more general optimization ... I wonder what that is?

(edit: a few minutes later, I seem to remember that equivclass.c has
to do something special with the X=X case, so maybe it could do
something else special instead, with little new overhead.)

Yea, I think this should be inferrable from information we essentially
already compute for equivclasses.

(i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar
with OR)

I can't remember proposals about adding this, but it seems worthwhile to
consider. I think we should be able to check for this without a lot of
planner overhead.

It would be enormously expensive to check that in the general case with
a bunch of entries in each IN list. Perhaps it would be OK to add on
the presumption that few queries would contain multiple IN's on the same
column in the first place, though. Not sure.

Merging of ORs should be near free if you leave duplicates in there, and
the duplicates aren't a huge concern if your alternative is is to have
them, but also have two clauses to evaluate.

I think the IN AND IN case should usually end up a clear win as
well. Both lists are going to be evaluated for each row anyway - you
don't need a lot of rows where clauses are evaluated to make it worth
the O(n*log(n)) of sorting the lists. Sorting them would be beneficial
for other reasons as well, e.g. it improves access patterns for SAO
index scans.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#3)

Re: Discussion on missing optimizations

On 07/10/17 04:19, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:

The article in question is here:
https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/

That's interesting.

The impression I have in a quick scan is that probably hardly any of these
are cases that any of the DB designers think are important in themselves.
Rather, they fall out of more general optimization attempts, or not,
depending on the optimization mechanisms in use in a particular DB.
For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
comes out of a constant-subexpression-precalculation mechanism for us,
whereas "WHERE column=column" doesn't fall to that approach. ISTM it
would be really dumb to expend planner cycles looking specifically for
that case, so I guess that DB2 et al are finding it as a side-effect of
some more general optimization ... I wonder what that is?

(edit: a few minutes later, I seem to remember that equivclass.c has
to do something special with the X=X case, so maybe it could do
something else special instead, with little new overhead.)

What it actually does is to specifically skip the processing for X=X
(the const expression will be simplified by
estimate_expression_value/eval_const_expressions separately). There is
comment there that specifically states that it's not worth it to process
this as it's rare clause which is equal to X IS NOT NULL.

I don't actually agree with the argument of the comment there, since in
practice the if the "silly" equality is not there, we'll just waste
equal() call and if it is there the optimization seems worth it as it
will lead to orders of magnitude better estimation in many cases.

So I wrote prototype of achieving this optimization and it seems to be
really quite simple code-wise (see attached). I did only a limited
manual testing of this but I don't see any negative impact on planning time.

Thoughts?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

David Rowley

dgrowleyml@gmail.com

over 8 years ago

In reply to: Tom Lane (#3)

Re: Discussion on missing optimizations

On 7 October 2017 at 15:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

9. Unneeded Self JOIN

Can't remember discussions of this.

I can't get very excited about that one either.

In the end, what the article fails to consider is that all of these are
tradeoffs, not unalloyed goods. If you spend planner cycles on every
query to look for cases that only the most unabashedly brain-dead ORMs
ever generate, you're not really doing your users a favor on balance.

I think that final sentence lacks imagination.

I've seen plenty of views being called where some column is
unavailable, but the caller joins the very same table again on the
primary key to add the column. There was no brain-dead ORM involved,
just perhaps questionable design. This was very common in my last job
where we had some rats nest of views several layers deep, the core of
which often had some complex logic that nobody dared to try and
replicate.

It would be fairly cheap to check if any of the rtekind==RTE_RELATION
joinlist items have above 1 RangeTblEntry with the same relid. The
joinlist is never that big anyway, and if it was the join search would
be slow. The more expensive part would be to build the join clauses,
check if the expressions on either side of all OpExpr matches and that
nothing else will cause a non-match, then perform the uniqueness check
on those OpExpr operands. We do have some infrastructure to do the
unique checks. Likely the slowdown in planning would be just for
cases with a legitimately useful self-join, I doubt checking for a
duplicate RangeTblEntry->relid would cause much of a concern.

Anyway, this is giving me some feeling of Deja vu.. Perhaps we need
some pg_stats view that shows us planning time and execution time so
that we can get a better idea on how much these things matter in the
average case. We tend to never fare so well in these sorts of
comparisons with commercial databases. It's not hard to imagine
someone with migration plans loading some rats nest of views into
Postgres and taking it for a spin and finding performance is not quite
what they need. It's a bit sad that often the people with the loudest
voices are always so fast to stomp on the ideas for improvements. It
would be much nicer if you'd at least wait for benchmarks before
shooting.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Petr Jelinek (#5)

Re: Discussion on missing optimizations

Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:

On 07/10/17 04:19, Tom Lane wrote:

(edit: a few minutes later, I seem to remember that equivclass.c has
to do something special with the X=X case, so maybe it could do
something else special instead, with little new overhead.)

So I wrote prototype of achieving this optimization and it seems to be
really quite simple code-wise (see attached). I did only a limited
manual testing of this but I don't see any negative impact on planning time.

No, I'm afraid you didn't read that comment closely enough. This will
flat out fail for cases like "select ... where x=x order by x", because
there will already be a single-element EC for x and so the clause will
just disappear entirely. If that doesn't happen, then instead you're
creating an EC with duplicate entries, which is an idea I seriously
dislike; the semantics of that don't seem clear at all to me.

What I was imagining was that having detected X=X, instead of "throwing
back" the clause as-is we could throw back an X IS NOT NULL clause,
along the lines of the attached.

This passes the smell test for me in the sense of not adding any
significant number of planner cycles except when the weird case occurs.
It leaves something on the table in that there are some situations
where X=X could be converted, but we won't do it because we don't see
the clause as being a potential EC (because it's not at top level),
as in the second new regression test case below. I think that's
probably all right; I don't see any way to be more thorough without
adding a lot of new cycles all the time, and I don't believe this is
worth that.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: David Rowley (#6)

Re: Discussion on missing optimizations

David Rowley <david.rowley@2ndquadrant.com> writes:

It would be much nicer if you'd at least wait for benchmarks before
shooting.

Benchmarks of what? We'd have to expend quite a bit of effort just
to get to a place where we'd have something to benchmark. I do not
think it's unreasonable of me to express an opinion that that would
likely be wasted effort. If you disagree, and are willing to expend
such effort speculatively, I'm not stopping you.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#7)

Re: Discussion on missing optimizations

On 07/10/17 18:15, Tom Lane wrote:

Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:

On 07/10/17 04:19, Tom Lane wrote:

(edit: a few minutes later, I seem to remember that equivclass.c has
to do something special with the X=X case, so maybe it could do
something else special instead, with little new overhead.)

So I wrote prototype of achieving this optimization and it seems to be
really quite simple code-wise (see attached). I did only a limited
manual testing of this but I don't see any negative impact on planning time.

No, I'm afraid you didn't read that comment closely enough. This will
flat out fail for cases like "select ... where x=x order by x", because
there will already be a single-element EC for x and so the clause will
just disappear entirely. If that doesn't happen, then instead you're
creating an EC with duplicate entries, which is an idea I seriously
dislike; the semantics of that don't seem clear at all to me.

Hmm it did not disappear (and worked fine in SQL level tests). I don't
think I fully understand the "EC with duplicate entries" part and what's
the problem with it so I'll defer to your wisdom there.

What I was imagining was that having detected X=X, instead of "throwing
back" the clause as-is we could throw back an X IS NOT NULL clause,
along the lines of the attached.

Right, I wanted to avoid messing with the incoming result info, but if
we don't want to call the code bellow or write tons of code for this, I
guess it's the best we can do.

This passes the smell test for me in the sense of not adding any
significant number of planner cycles except when the weird case occurs.
It leaves something on the table in that there are some situations
where X=X could be converted, but we won't do it because we don't see
the clause as being a potential EC (because it's not at top level),
as in the second new regression test case below. I think that's
probably all right; I don't see any way to be more thorough without
adding a lot of new cycles all the time, and I don't believe this is
worth that.

My code had same issue. I think going deeper would require quite a bit
of new code (and cycles) for something that's even less likely to happen
than simple X=X while the current patch is quite cheap win.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Petr Jelinek (#9)

Re: Discussion on missing optimizations

Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:

On 07/10/17 18:15, Tom Lane wrote:

No, I'm afraid you didn't read that comment closely enough. This will
flat out fail for cases like "select ... where x=x order by x", because
there will already be a single-element EC for x and so the clause will
just disappear entirely. If that doesn't happen, then instead you're
creating an EC with duplicate entries, which is an idea I seriously
dislike; the semantics of that don't seem clear at all to me.

Hmm it did not disappear (and worked fine in SQL level tests).

I may not be correctly remembering what the query would need to look
like for there to be single-element ECs in existence at this point; but
I surely would not like this code to assume that none will exist till
later.

I don't
think I fully understand the "EC with duplicate entries" part and what's
the problem with it so I'll defer to your wisdom there.

Well, as one example, assume that we use your patch and consider what
happens with
where x = y and x = x
vs
where x = x and x = y

In the first case we end up with an EC that's just {x,y}, because the
second process_equivalence() will find both sides in the same EC and
conclude that the second clause is redundant. (Which it is, if the
equality operators have the same notion of what to do with nulls.)
In the second case we end up with an EC containing {x,x,y}, which
at minimum will result in emitting redundant generated quals. I'm
not sure if it could have any worse consequences than that, but I'm
not sure it couldn't either. But this is bogus in any case, because
those WHERE clauses surely should produce identical results.

Looking further ahead, if ECs have to support being multisets rather
than pure sets, that would put a crimp in ever improving this logic to
use a smarter UNION-FIND algorithm. (I've not yet seen queries where the
number of EC members is large enough to make that a serious issue, but
I think it could happen, particularly with inheritance/partitioning.)

This passes the smell test for me in the sense of not adding any
significant number of planner cycles except when the weird case occurs.
It leaves something on the table in that there are some situations
where X=X could be converted, but we won't do it because we don't see
the clause as being a potential EC (because it's not at top level),
as in the second new regression test case below. I think that's
probably all right; I don't see any way to be more thorough without
adding a lot of new cycles all the time, and I don't believe this is
worth that.

My code had same issue. I think going deeper would require quite a bit
of new code (and cycles) for something that's even less likely to happen
than simple X=X while the current patch is quite cheap win.

Yeah. I'm not really convinced it's a big win, but it's so cheap we
might as well do it. The only case where it would expend cycles and
fail to give an improvement is if we have X=X with a non-strict operator,
which I think is a case that never arises in practice at present,
because btree effectively assumes that all btree operators are strict.
(Someday we might want to relax that, though, which is why I don't
want to let the planner just assume it.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Nico Williams

nico@cryptonector.com

over 8 years ago

In reply to: Tom Lane (#3)

Re: Discussion on missing optimizations

On Fri, Oct 06, 2017 at 10:19:54PM -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:

The article in question is here:
https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/

That's interesting.

The impression I have in a quick scan is that probably hardly any of these
are cases that any of the DB designers think are important in themselves.

That's true for some of those. But some of them might become important
when you start pushing WHERE constraints from outside into inner table
sources and subqueries, as dumb-looking constraints can simply appear
from pushing non-dumb-looking constraints.

More than the op optimizations would make a big difference for me:

- turning subqueries into joins

- turning ORs into UNIONs

It is easy enough to work around the lack of this optimization in
many cases, but it does make queries more verbose.

- pushing WHERE constraints from outer queries into the table source
queries (_including_ VIEWs)

- determining that some table in a query that had WHERE constraints
pushed into it... now has a very well-filled out lookup key,
therefore it's the one that should be the table source to start
the plan with, i.e., that it should be first in the outermost loop
of a nested loop join

For me these two would be huge wins. I have to resort to
functions with roughly the same body as views just so that I can
have the optimizer pick the correct plan. This causes a lot of
code duplication in my schemas.

- pushing WHERE constraints from outer queries into HAVING thence WHERE
constraints on GROUP BY queries where the outer constraints are on
columns used to GROUP BY

I find myself making two versions of views that do aggregation: one
that does not, and one that does. This allows me to use the
non-aggregating view in contexts where I need this optimization, but
then I have to re-code the aggregation at that layer. Again, lots of
duplication.

These sorts of optimizations are huge.

Rather, they fall out of more general optimization attempts, or not,
depending on the optimization mechanisms in use in a particular DB.
For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
comes out of a constant-subexpression-precalculation mechanism for us,
whereas "WHERE column=column" doesn't fall to that approach. ISTM it
would be really dumb to expend planner cycles looking specifically for
that case, so I guess that DB2 et al are finding it as a side-effect of
some more general optimization ... I wonder what that is?

If you can reduce the number of compilations / optimization passes for
statements, then spending more time in the optimizer is not a big deal.
So, when invoked via PREPARE I would say spending more cycles looking
for this sort of thing is OK, but in many other cases it's not.

Also, sometimes these cases crop up do to pushing constraints into VIEWs
and sub-queries. In those cases then constant sub-expression
elimination can be a win.

(edit: a few minutes later, I seem to remember that equivclass.c has
to do something special with the X=X case, so maybe it could do
something else special instead, with little new overhead.)

I'd expect that column = column is not trivial to turn into TRUE, not
unless those columns are NOT NULLable.

9. Unneeded Self JOIN

Can't remember discussions of this.

I can't get very excited about that one either.

In the end, what the article fails to consider is that all of these are
tradeoffs, not unalloyed goods. If you spend planner cycles on every
query to look for cases that only the most unabashedly brain-dead ORMs
ever generate, you're not really doing your users a favor on balance.

I can't get very excited about this one either, though I do believe it
can arise as the author says, "when you build complex views and JOIN
them to each other". Maybe I'm not excited about it because I've not
needed it :)

Nico
--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Petr Jelinek

petr@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#10)

Re: Discussion on missing optimizations

On 07/10/17 19:59, Tom Lane wrote:

Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:

On 07/10/17 18:15, Tom Lane wrote:

No, I'm afraid you didn't read that comment closely enough. This will
flat out fail for cases like "select ... where x=x order by x", because
there will already be a single-element EC for x and so the clause will
just disappear entirely. If that doesn't happen, then instead you're
creating an EC with duplicate entries, which is an idea I seriously
dislike; the semantics of that don't seem clear at all to me.

Hmm it did not disappear (and worked fine in SQL level tests).

I may not be correctly remembering what the query would need to look
like for there to be single-element ECs in existence at this point; but
I surely would not like this code to assume that none will exist till
later.

I don't
think I fully understand the "EC with duplicate entries" part and what's
the problem with it so I'll defer to your wisdom there.

Well, as one example, assume that we use your patch and consider what
happens with
where x = y and x = x
vs
where x = x and x = y

In the first case we end up with an EC that's just {x,y}, because the
second process_equivalence() will find both sides in the same EC and
conclude that the second clause is redundant. (Which it is, if the
equality operators have the same notion of what to do with nulls.)
In the second case we end up with an EC containing {x,x,y}, which
at minimum will result in emitting redundant generated quals. I'm
not sure if it could have any worse consequences than that, but I'm
not sure it couldn't either. But this is bogus in any case, because
those WHERE clauses surely should produce identical results.

Looking further ahead, if ECs have to support being multisets rather
than pure sets, that would put a crimp in ever improving this logic to
use a smarter UNION-FIND algorithm. (I've not yet seen queries where the
number of EC members is large enough to make that a serious issue, but
I think it could happen, particularly with inheritance/partitioning.)

Okay, that makes sense, thanks for explanation. Your patch is the way to
go then.

This passes the smell test for me in the sense of not adding any
significant number of planner cycles except when the weird case occurs.
It leaves something on the table in that there are some situations
where X=X could be converted, but we won't do it because we don't see
the clause as being a potential EC (because it's not at top level),
as in the second new regression test case below. I think that's
probably all right; I don't see any way to be more thorough without
adding a lot of new cycles all the time, and I don't believe this is
worth that.

My code had same issue. I think going deeper would require quite a bit
of new code (and cycles) for something that's even less likely to happen
than simple X=X while the current patch is quite cheap win.

Yeah. I'm not really convinced it's a big win, but it's so cheap we
might as well do it. The only case where it would expend cycles and
fail to give an improvement is if we have X=X with a non-strict operator,
which I think is a case that never arises in practice at present,
because btree effectively assumes that all btree operators are strict.
(Someday we might want to relax that, though, which is why I don't
want to let the planner just assume it.)

In real-life probably not, but seems like these small bits can be useful
marketing wise, given articles like the one which started this. And it
should not hurt anything either.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Adam Brusselback

adambrusselback@gmail.com

over 8 years ago

In reply to: Petr Jelinek (#12)

Re: Discussion on missing optimizations

I can't get very excited about this one either, though I do believe it
can arise as the author says, "when you build complex views and JOIN
them to each other". Maybe I'm not excited about it because I've not
needed it :)

This is one that I know would help with my database. There is a ton
of logic stored in views,
which get joined to to the original table to filter the set rather
than imposing that set of
conditions in every separate query.

It would be really nice if the optimizer could simplify those to
eliminate the self join. It's almost always
on the primary key of a table that the join would happen on, and if
not it'd be a non-nullable column for sure.

On another note:

turning ORs into UNIONs

This is another one which would be incredibly useful for me. I've had
to do this manually for performance
reasons far too often.

Partially agreed. A comment to the article also mentions that some other
database performs more optimizations depending on the cost of the
plan. That's not easy to do in our current plan structure, but I think
it's quite a worthwhile concept.

I would love to see this in Postgres. It would allow the planner to
not waste cycles unnecessarily on
queries where it's just not needed, and to potentially spend a few
more cycles planning on very
costly queries to save a ton while executing.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Adam Brusselback (#13)

Re: Discussion on missing optimizations

Adam Brusselback <adambrusselback@gmail.com> writes:

On another note:

turning ORs into UNIONs

This is another one which would be incredibly useful for me. I've had
to do this manually for performance reasons far too often.

Well, maybe you could sign up to help review the open patch for that then:
https://commitfest.postgresql.org/15/1001/

The reason that's not in v10 is we haven't been able to convince
ourselves whether it's 100% correct.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#14)

Re: Discussion on missing optimizations

On 2017-10-08 11:28:09 -0400, Tom Lane wrote:

Adam Brusselback <adambrusselback@gmail.com> writes:

On another note:

turning ORs into UNIONs

This is another one which would be incredibly useful for me. I've had
to do this manually for performance reasons far too often.

Well, maybe you could sign up to help review the open patch for that then:
https://commitfest.postgresql.org/15/1001/

The reason that's not in v10 is we haven't been able to convince
ourselves whether it's 100% correct.

Unfortunately it won't help in this specific case (no support for UNION,
just UNION ALL), but I thought it might be interesting to reference
https://medium.com/@uwdb/introducing-cosette-527898504bd6
here.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Petr Jelinek (#12)

Re: Discussion on missing optimizations

Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:

Okay, that makes sense, thanks for explanation. Your patch is the way to
go then.

Hearing no further comment, pushed. Thanks for reviewing it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#15)

Re: Discussion on missing optimizations

Andres Freund <andres@anarazel.de> writes:

On 2017-10-08 11:28:09 -0400, Tom Lane wrote:

https://commitfest.postgresql.org/15/1001/
The reason that's not in v10 is we haven't been able to convince
ourselves whether it's 100% correct.

Unfortunately it won't help in this specific case (no support for UNION,
just UNION ALL), but I thought it might be interesting to reference
https://medium.com/@uwdb/introducing-cosette-527898504bd6
here.

Huh, that is an interesting project indeed. Although I'm not sure that
it quite addresses the question of whether an optimization transform
is valid. IIUC, it could prove that a particular query having been fed
through the transform didn't change semantics, but that offers only
limited insight into whether some other query fed through the transform
might change.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#17)

Re: Discussion on missing optimizations

On 2017-10-08 17:11:44 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-10-08 11:28:09 -0400, Tom Lane wrote:

https://commitfest.postgresql.org/15/1001/
The reason that's not in v10 is we haven't been able to convince
ourselves whether it's 100% correct.

Unfortunately it won't help in this specific case (no support for UNION,
just UNION ALL), but I thought it might be interesting to reference
https://medium.com/@uwdb/introducing-cosette-527898504bd6
here.

Huh, that is an interesting project indeed. Although I'm not sure that
it quite addresses the question of whether an optimization transform
is valid. IIUC, it could prove that a particular query having been fed
through the transform didn't change semantics, but that offers only
limited insight into whether some other query fed through the transform
might change.

According to the guide it offers some support for more general
transformations:
http://cosette.cs.washington.edu/guide#24-symbolic-predicates That's
still only going to be sufficient for some of the interesting cases, but
still...

Wonder about pinging them about the OR -> UNION case, they've been
responsive to problem in some threads I found online.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

David Rowley

dgrowleyml@gmail.com

over 8 years ago

In reply to: Andres Freund (#2)

Re: Discussion on missing optimizations

On 7 October 2017 at 14:48, Andres Freund <andres@anarazel.de> wrote:

3. JOIN Elimination

There's been a lot of discussion and several patches. There's a bunch of
problems here, one being that there's cases (during trigger firing,
before the constraint checks) where foreign keys don't hold true, so we
can't quite generally make these optimization. Another concern is
whether the added plan time is actually worthwhile.

I looked over this and it seems there's some pretty low hanging fruit
in there that we can add with just a handful of lines of new code.

This is the case for LEFT JOINs with a DISTINCT clause. Normally we
can only remove an unused LEFT JOINed relation if there are some
unique properties that ensure that the join does not duplicate any
outer row. We don't need to worry about this when there's a DISTINCT
clause, as the DISTINCT would remove any duplicates anyway. If I'm not
mistaken, we also don't need to bother looking at the actual distinct
clause's exprs since we'll already know that nothing is in there
regarding the relation being removed. The benefit to this could be
two-fold, as 1) we don't need to join to the unused relation and 2) we
don't need to remove any row duplication caused by the join.

While someone might argue that this is not going to be that common a
case to hit, if we consider how cheap this is to implement, it does
seem to be worth doing a couple of NULL checks in the planner for it.

The only slight downside I can see to this is that we're letting a few
more cases through rel_supports_distinctness() which is also used for
unique joins, and these proofs are not valid in those. However, it may
not be worth the trouble doing anything about that as relations
without unique indexes are pretty uncommon (at least in my world).

Thoughts?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services