Removing INNER JOINs

Started by Andreas Joseph Kroghover 8 years ago13 messagesgeneral

andreas@visena.com

over 8 years ago

Hi all.

I came across this from
Oracle: https://oracle-base.com/articles/misc/join-elimination#basic-join-elimination

Needless to say, this would be very cool to have in PG:-)

It seems this has been discussed before [1], [2], [3], and the consesus at the
time was that the proposted implementation introduced way too much
planning-overhead to be worth it. Given that other RDBMS-vendors provides this,
and it's on the "Cool feactures other have that we don't"-list [4], is anyone
interessted in working on improving this?

1. Patch to support SEMI and ANTI join removal:

/messages/by-id/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com
2. Patch to Remove INNER JOINs:

/messages/by-id/CAApHDvp4SsyQq5r+j5iUA1rF1SuWGD5QrhoVLqOqOxVXe=Njxw@mail.gmail.com
3. Removing INNER JOINs
http://www.postgresql-archive.org/Removing-INNER-JOINs-td5828650i40.html

4. https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/#top3

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

David Rowley

dgrowleyml@gmail.com

over 8 years ago

In reply to: Andreas Joseph Krogh (#1)

Re: Removing INNER JOINs

On 1 December 2017 at 02:52, Andreas Joseph Krogh <andreas@visena.com> wrote:

I came across this from Oracle: https://oracle-base.com/articles/misc/join-elimination#basic-join-elimination

Needless to say, this would be very cool to have in PG:-)

It would be nice, I agree.

It seems this has been discussed before [1], [2], [3], and the consesus at the time was that the proposted implementation introduced way too much planning-overhead to be worth it. Given that other RDBMS-vendors provides this, and it's on the "Cool feactures other have that we don't"-list [4], is anyone interessted in working on improving this?

The large hurdle which a good workaround was never really found for
was the fact that foreign key triggers only update the referenced rows
at the end of the statement, or end of query when the foreign key
constraint is deferred. I don't recall much concern about planner
overhead. It's likely not going to be too big a concern since we're
already checking for foreign keys nowadays during selectivity
estimation.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Andreas Joseph Krogh

andreas@visena.com

over 8 years ago

In reply to: David Rowley (#2)

Sv: Re: Removing INNER JOINs

På fredag 01. desember 2017 kl. 02:20:19, skrev David Rowley <
david.rowley@2ndquadrant.com <mailto:david.rowley@2ndquadrant.com>>:
On 1 December 2017 at 02:52, Andreas Joseph Krogh <andreas@visena.com> wrote:

I came across this from Oracle:

https://oracle-base.com/articles/misc/join-elimination#basic-join-elimination

Needless to say, this would be very cool to have in PG:-)

It would be nice, I agree.

It seems this has been discussed before [1], [2], [3], and the consesus at

the time was that the proposted implementation introduced way too much
planning-overhead to be worth it. Given that other RDBMS-vendors provides this,
and it's on the "Cool feactures other have that we don't"-list [4], is anyone
interessted in working on improving this?

I do still have all the code I wrote all those years ago, and possibly
it will still apply to master as I rebased it just several months ago.
I've just not yet come up with any bright ideas on how to solve the
foreign key trigger timing problem.

Thanks for answer.

I guess I'm back to hoping someone will spend time thinking about those
challenges. In our app we have more and more dynamic queries which join lots of
tables but not necessarily SELECTs from them. Maintaining the list of
join-tables when constructing the dynamic queries is a pain...

Especially when running "paging-queries", like "Give me top 100 of 899 345
totals". We fire two queries, one with the SELECT-list and one with count(*),
but with the same FROM-clause. I see LEFT OUTER JOINs are being removed when
issuing the count(*), which helps, but remove more JOINs would (probably) help
even more:-)

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

Simon Riggs

simon@2ndQuadrant.com

over 8 years ago

In reply to: David Rowley (#2)

Re: Removing INNER JOINs

On 1 December 2017 at 12:20, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 1 December 2017 at 02:52, Andreas Joseph Krogh <andreas@visena.com> wrote:

I came across this from Oracle: https://oracle-base.com/articles/misc/join-elimination#basic-join-elimination

Needless to say, this would be very cool to have in PG:-)

It would be nice, I agree.

It seems this has been discussed before [1], [2], [3], and the consesus at the time was that the proposted implementation introduced way too much planning-overhead to be worth it. Given that other RDBMS-vendors provides this, and it's on the "Cool feactures other have that we don't"-list [4], is anyone interessted in working on improving this?

The large hurdle which a good workaround was never really found for
was the fact that foreign key triggers only update the referenced rows
at the end of the statement, or end of query when the foreign key
constraint is deferred. I don't recall much concern about planner
overhead. It's likely not going to be too big a concern since we're
already checking for foreign keys nowadays during selectivity
estimation.

I do still have all the code I wrote all those years ago, and possibly
it will still apply to master as I rebased it just several months ago.
I've just not yet come up with any bright ideas on how to solve the
foreign key trigger timing problem.

So it would work if the Foreign Keys are marked NOT DEFERRABLE?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Andreas Joseph Krogh

andreas@visena.com

over 8 years ago

In reply to: Simon Riggs (#4)

Sv: Re: Removing INNER JOINs

På fredag 01. desember 2017 kl. 03:30:21, skrev Simon Riggs <
simon@2ndquadrant.com <mailto:simon@2ndquadrant.com>>:
On 1 December 2017 at 12:20, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 1 December 2017 at 02:52, Andreas Joseph Krogh <andreas@visena.com>

wrote:

I came across this from Oracle:

https://oracle-base.com/articles/misc/join-elimination#basic-join-elimination

Needless to say, this would be very cool to have in PG:-)

It would be nice, I agree.

It seems this has been discussed before [1], [2], [3], and the consesus at

The large hurdle which a good workaround was never really found for
was the fact that foreign key triggers only update the referenced rows
at the end of the statement, or end of query when the foreign key
constraint is deferred. I don't recall much concern about planner
overhead. It's likely not going to be too big a concern since we're
already checking for foreign keys nowadays during selectivity
estimation.

I do still have all the code I wrote all those years ago, and possibly
it will still apply to master as I rebased it just several months ago.
I've just not yet come up with any bright ideas on how to solve the
foreign key trigger timing problem.

So it would work if the Foreign Keys are marked NOT DEFERRABLE?

Can someone please explain, in layman-terms, what the problems with FKs are
related to JOIN-removal?

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

David Rowley

dgrowleyml@gmail.com

over 8 years ago

In reply to: Simon Riggs (#4)

Re: Removing INNER JOINs

On 1 December 2017 at 15:30, Simon Riggs <simon@2ndquadrant.com> wrote:

On 1 December 2017 at 12:20, David Rowley <david.rowley@2ndquadrant.com> wrote:

The large hurdle which a good workaround was never really found for
was the fact that foreign key triggers only update the referenced rows
at the end of the statement, or end of query when the foreign key
constraint is deferred. I don't recall much concern about planner
overhead. It's likely not going to be too big a concern since we're
already checking for foreign keys nowadays during selectivity
estimation.

I do still have all the code I wrote all those years ago, and possibly
it will still apply to master as I rebased it just several months ago.
I've just not yet come up with any bright ideas on how to solve the
foreign key trigger timing problem.

So it would work if the Foreign Keys are marked NOT DEFERRABLE?

Unfortunately not, since a query may call some volatile function which
makes changes to referenced rows which could cause the removed join to
falsely "match" to referenced rows which no longer exist due to the
changes made by the volatile function not having been CASCADEd to the
referenced table yet.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

David Rowley

dgrowleyml@gmail.com

over 8 years ago

In reply to: Andreas Joseph Krogh (#5)

Re: Re: Removing INNER JOINs

On 1 December 2017 at 15:34, Andreas Joseph Krogh <andreas@visena.com>
wrote:

Can someone please explain, in layman-terms, what the problems with FKs
are related to JOIN-removal?

Pretty much what I just wrote after "Unfortunately not," above, although
you asked a few seconds before I sent.

We're able to use UNIQUE INDEXes as proofs to remove LEFT JOINs as (with
the exception of deferred unique indexes) these are updated right away,
rather than deferred until the end of the statement as is the case with NOT
DEFERRABLE and not DEFERRED foreign keys. The fact that the foreign keys do
not update the referenced rows right away means that there is a non-zero
window of time that the constraint is violated, therefore, if a query which
is run, or is running during that time, we could return the incorrect
results if we were to remove an INNER JOIN during the planning of that
query.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Andreas Joseph Krogh

andreas@visena.com

over 8 years ago

In reply to: David Rowley (#7)

Sv: Re: Re: Removing INNER JOINs

På fredag 01. desember 2017 kl. 03:53:29, skrev David Rowley <
david.rowley@2ndquadrant.com <mailto:david.rowley@2ndquadrant.com>>:
On 1 December 2017 at 15:34, Andreas Joseph Krogh <andreas@visena.com
<mailto:andreas@visena.com>> wrote: Can someone please explain, in
layman-terms, what the problems with FKs are related to JOIN-removal?

Pretty much what I just wrote after "Unfortunately not," above, although you
asked a few seconds before I sent.

We're able to use UNIQUE INDEXes as proofs to remove LEFT JOINs as (with the
exception of deferred unique indexes) these are updated right away, rather than
deferred until the end of the statement as is the case with NOT DEFERRABLE and
not DEFERRED foreign keys. The fact that the foreign keys do not update the
referenced rows right away means that there is a non-zero window of time that
the constraint is violated, therefore, if a query which is run, or is running
during that time, we could return the incorrect results if we were to remove an
INNER JOIN during the planning of that query.

Hm...
The fact that Oracle has solved this makes me think you guys can solve this
too if you put enough brain-power to it:-)

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

Jim Finnerty

jfinnert@amazon.com

over 8 years ago

In reply to: David Rowley (#6)

Re: Removing INNER JOINs

FK constraint enforcement may be deferred, but only to the end of the DML
transaction. Can someone provide an example where a SELECT statement that
is not in that DML transaction and that can view only committed data can see
a FK-side row that doesn't join with any PK-side row?

I don't think I understand why the volatile function scenario. Can a
volatile function somehow circumvent FK enforcement? If not, then why is
the volatile function scenario different than any other DML with respect to
the enforcement of FK constraints?

thank you,

/Jim F

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

#10

David Rowley

dgrowleyml@gmail.com

over 8 years ago

In reply to: Jim Finnerty (#9)

Re: Removing INNER JOINs

On 11 December 2017 at 17:33, Jim Finnerty <jfinnert@amazon.com> wrote:

FK constraint enforcement may be deferred, but only to the end of the DML
transaction. Can someone provide an example where a SELECT statement that
is not in that DML transaction and that can view only committed data can see
a FK-side row that doesn't join with any PK-side row?

I don't think I understand why the volatile function scenario. Can a
volatile function somehow circumvent FK enforcement? If not, then why is
the volatile function scenario different than any other DML with respect to
the enforcement of FK constraints?

It would have to be in the same transaction as the DML to get wrong
results, other transactions won't see the changes until the DML
transaction commits.

Here's an example of finding unmatched rows when a join is removed in
the same transaction as the DML.

create table t1 (id int primary key);
create table t2 (value int not null);

alter table t2 add constraint t2_t1_id_fkey foreign key (value)
references t1 match full on update cascasde on delete cascade;

create index on t2 (value);

create or replace function update_t1(p_id int, p_newid int) returns bigint as $$
declare c bigint;
begin
update t1 set id = p_newid where id = p_id;
-- count would always be 0 if foreign key was cascaded immediately
select count(*) into c from t2 where not exists(select 1 from t1 where
t1.id = t2.value);
return c;
end;
$$ language plpgsql volatile;

insert into t1 values(1),(2);
insert into t2 values(1),(2);

select *,update_t1(id, id + 2) from t1 order by id;

If we removed the anti-join in the count(*) into c query, we'd incorrectly get:

# select *,update_t1(id, id + 2) from t1 order by id;
id | update_t1
----+-----------
1 | 0
2 | 0
(2 rows)

when we should get:

# select *,update_t1(id, id + 2) from t1 order by id;
id | update_t1
----+-----------
1 | 1
2 | 2
(2 rows)

Yes, this is an ANTI-JOIN and not an INNER JOIN as per $subject, but
the same applies. This is just a more simple example to give.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#11

Jim Finnerty

jfinnert@amazon.com

over 8 years ago

In reply to: David Rowley (#10)

Re: Removing INNER JOINs

Great example, David. The planner can detect whether a SELECT statement
contains a volatile function, and can disable the proposed redundant
inner-join optimization in that case.

If necessary, the planner could also check that the FK constraint is not
DEFERRED, but if there are no volatile functions and the SELECT statement
can't see an inconsistent state created by any other transaction, I think
that just checking for volatile functions and not being inside a DML
transaction would be sufficient.

A further opportunity would be to apply this to any SELECT statement in a
DML transaction, provided that there was no prior DML statement or statement
containing a volatile function in the same transaction.

We already have a redundant outer join optimization, and I've implemented
the redundant inner join optimization in two other products before, so
adding the additional logic to support the inner join case(s) sounds
straightforward to me. Can anyone think of any other problem scenarios?

/Jim F

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

#12

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Jim Finnerty (#11)

Re: Removing INNER JOINs

Jim Finnerty <jfinnert@amazon.com> writes:

Great example, David. The planner can detect whether a SELECT statement
contains a volatile function, and can disable the proposed redundant
inner-join optimization in that case.

If necessary, the planner could also check that the FK constraint is not
DEFERRED, but if there are no volatile functions and the SELECT statement
can't see an inconsistent state created by any other transaction, I think
that just checking for volatile functions and not being inside a DML
transaction would be sufficient.

I don't think you're thinking nearly hard enough about what would break
this. The planner does not have much insight into the context a
statement is being used in (e.g. whether we're inside some kind of
PL function). Nor does it get to make assumptions about whether the plan
will be used inside a transaction block or not.

regards, tom lane

#13

David Rowley

dgrowleyml@gmail.com

over 8 years ago

In reply to: Jim Finnerty (#11)

Re: Removing INNER JOINs

On 12 December 2017 at 02:38, Jim Finnerty <jfinnert@amazon.com> wrote:

If necessary, the planner could also check that the FK constraint is not
DEFERRED, but if there are no volatile functions and the SELECT statement
can't see an inconsistent state created by any other transaction, I think
that just checking for volatile functions and not being inside a DML
transaction would be sufficient.

A further opportunity would be to apply this to any SELECT statement in a
DML transaction, provided that there was no prior DML statement or statement
containing a volatile function in the same transaction.

We already have a redundant outer join optimization, and I've implemented
the redundant inner join optimization in two other products before, so
adding the additional logic to support the inner join case(s) sounds
straightforward to me. Can anyone think of any other problem scenarios?

You should read over [1]/messages/by-id/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com. This was my attempt at doing this over 3
years ago. The thread might save you from going down some of the dead
ends that I ended up going down.

[1]: /messages/by-id/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services