Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

Started by Rajkumar Raghuwanshiabout 10 years ago8 messageshackers
Jump to latest
#1Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com

Hi,

I am testing postgres_fdw join pushdown feature for PostgreSQL 9.6 DB, and
I observed below issue.

*Observation:* Inner join and full outer join combination on a table
generating wrong result.

SELECT * FROM lt;
c1
----
1
2
(2 rows)

SELECT * FROM ft;
c1
----
1
2
(2 rows)

\d+ ft
Foreign table "public.ft"
Column | Type | Modifiers | FDW Options | Storage | Stats target |
Description
--------+---------+-----------+-------------+---------+--------------+-------------
c1 | integer | | | plain | |
Server: link_server
FDW Options: (table_name 'lt')

--inner join and full outer join on local tables
SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
FULL JOIN lt t3 ON (t2.c1 = t3.c1);
c1 | c1 | c1
----+----+----
1 | 1 | 1
2 | 2 | 2
(2 rows)

--inner join and full outer join on corresponding foreign tables
SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
FULL JOIN ft t3 ON (t2.c1 = t3.c1);
c1 | c1 | c1
----+----+----
1 | 1 | 1
1 | 2 |
2 | 1 |
2 | 2 | 2
(4 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Show quoted text
#2Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Rajkumar Raghuwanshi (#1)
Re: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

On 2016/03/28 18:17, Rajkumar Raghuwanshi wrote:

I am testing postgres_fdw join pushdown feature for PostgreSQL 9.6 DB,
and I observed below issue._

Observation:_ Inner join and full outer join combination on a table
generating wrong result.

SELECT * FROM lt;
c1
----
1
2
(2 rows)

SELECT * FROM ft;
c1
----
1
2
(2 rows)

\d+ ft
Foreign table "public.ft"
Column | Type | Modifiers | FDW Options | Storage | Stats target |
Description
--------+---------+-----------+-------------+---------+--------------+-------------
c1 | integer | | | plain | |
Server: link_server
FDW Options: (table_name 'lt')

--inner join and full outer join on local tables
SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
FULL JOIN lt t3 ON (t2.c1 = t3.c1);
c1 | c1 | c1
----+----+----
1 | 1 | 1
2 | 2 | 2
(2 rows)

--inner join and full outer join on corresponding foreign tables
SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
FULL JOIN ft t3 ON (t2.c1 = t3.c1);
c1 | c1 | c1
----+----+----
1 | 1 | 1
1 | 2 |
2 | 1 |
2 | 2 | 2
(4 rows)

I think the reason for that is in foreign_join_ok. This in that function:

switch (jointype)
{
case JOIN_INNER:
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_i->remote_conds));
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_o->remote_conds));
break;

case JOIN_LEFT:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_i->remote_conds));
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_o->remote_conds));
break;

case JOIN_RIGHT:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_o->remote_conds));
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,

list_copy(fpinfo_i->remote_conds));
break;

case JOIN_FULL:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_i->remote_conds));
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,

list_copy(fpinfo_o->remote_conds));
break;

default:
/* Should not happen, we have just check this above */
elog(ERROR, "unsupported join type %d", jointype);
}

wrongly pulls up remote_conds from joining relations in the FULL JOIN
case. I think we should not pull up such conditions in the FULL JOIN case.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Etsuro Fujita (#2)
Re: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

Observation:_ Inner join and full outer join combination on a table

generating wrong result.

SELECT * FROM lt;
c1
----
1
2
(2 rows)

SELECT * FROM ft;
c1
----
1
2
(2 rows)

\d+ ft
Foreign table "public.ft"
Column | Type | Modifiers | FDW Options | Storage | Stats target |
Description

--------+---------+-----------+-------------+---------+--------------+-------------
c1 | integer | | | plain | |
Server: link_server
FDW Options: (table_name 'lt')

--inner join and full outer join on local tables
SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
FULL JOIN lt t3 ON (t2.c1 = t3.c1);
c1 | c1 | c1
----+----+----
1 | 1 | 1
2 | 2 | 2
(2 rows)

--inner join and full outer join on corresponding foreign tables
SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
FULL JOIN ft t3 ON (t2.c1 = t3.c1);
c1 | c1 | c1
----+----+----
1 | 1 | 1
1 | 2 |
2 | 1 |
2 | 2 | 2
(4 rows)

Thanks Rajkumar for the detailed report.

I think the reason for that is in foreign_join_ok. This in that function:

wrongly pulls up remote_conds from joining relations in the FULL JOIN
case. I think we should not pull up such conditions in the FULL JOIN case.

Right. For a full outer join, since each joining relation acts as outer for
the other, we can not pull up the quals to either join clauses or other
clauses. So, in such a case, we will need to encapsulate the joining
relation with conditions into a subquery. Unfortunately, the current
deparse logic does not handle this encapsulation. Adding that functionality
so close to the feature freeze might be risky given the amount of code
changes required.

PFA patch with a quick fix. A full outer join with either of the joining
relations having WHERE conditions (or other clauses) is not pushed down. In
the particular case that was reported, the bug triggered because of the way
conditions are handled for an inner join. For an inner join, all the
conditions in ON as well as WHERE clause are treated like they are part of
WHERE clause. This allows pushing down a join even if there are unpushable
join clauses. But the pushable conditions can be put back into the ON
clause. This avoids using subqueries while deparsing.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_fj_cond.patchapplication/x-patch; name=pg_fj_cond.patchDownload+184-81
#4Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Ashutosh Bapat (#3)
Re: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

Thanks Ashutosh for the patch. I have applied and tested it. Now getting
proper result for reported issue.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Tue, Mar 29, 2016 at 7:50 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Show quoted text

Observation:_ Inner join and full outer join combination on a table

generating wrong result.

SELECT * FROM lt;
c1
----
1
2
(2 rows)

SELECT * FROM ft;
c1
----
1
2
(2 rows)

\d+ ft
Foreign table "public.ft"
Column | Type | Modifiers | FDW Options | Storage | Stats target |
Description

--------+---------+-----------+-------------+---------+--------------+-------------
c1 | integer | | | plain | |
Server: link_server
FDW Options: (table_name 'lt')

--inner join and full outer join on local tables
SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
FULL JOIN lt t3 ON (t2.c1 = t3.c1);
c1 | c1 | c1
----+----+----
1 | 1 | 1
2 | 2 | 2
(2 rows)

--inner join and full outer join on corresponding foreign tables
SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
FULL JOIN ft t3 ON (t2.c1 = t3.c1);
c1 | c1 | c1
----+----+----
1 | 1 | 1
1 | 2 |
2 | 1 |
2 | 2 | 2
(4 rows)

Thanks Rajkumar for the detailed report.

I think the reason for that is in foreign_join_ok. This in that function:

wrongly pulls up remote_conds from joining relations in the FULL JOIN
case. I think we should not pull up such conditions in the FULL JOIN case.

Right. For a full outer join, since each joining relation acts as outer
for the other, we can not pull up the quals to either join clauses or other
clauses. So, in such a case, we will need to encapsulate the joining
relation with conditions into a subquery. Unfortunately, the current
deparse logic does not handle this encapsulation. Adding that functionality
so close to the feature freeze might be risky given the amount of code
changes required.

PFA patch with a quick fix. A full outer join with either of the joining
relations having WHERE conditions (or other clauses) is not pushed down. In
the particular case that was reported, the bug triggered because of the way
conditions are handled for an inner join. For an inner join, all the
conditions in ON as well as WHERE clause are treated like they are part of
WHERE clause. This allows pushing down a join even if there are unpushable
join clauses. But the pushable conditions can be put back into the ON
clause. This avoids using subqueries while deparsing.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#5Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#3)
Re: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

On 2016/03/29 23:20, Ashutosh Bapat wrote:

I think the reason for that is in foreign_join_ok. This in that
function:

wrongly pulls up remote_conds from joining relations in the FULL
JOIN case. I think we should not pull up such conditions in the
FULL JOIN case.

Right. For a full outer join, since each joining relation acts as outer
for the other, we can not pull up the quals to either join clauses or
other clauses. So, in such a case, we will need to encapsulate the
joining relation with conditions into a subquery. Unfortunately, the
current deparse logic does not handle this encapsulation. Adding that
functionality so close to the feature freeze might be risky given the
amount of code changes required.

PFA patch with a quick fix. A full outer join with either of the joining
relations having WHERE conditions (or other clauses) is not pushed down.
In the particular case that was reported, the bug triggered because of
the way conditions are handled for an inner join. For an inner join, all
the conditions in ON as well as WHERE clause are treated like they are
part of WHERE clause. This allows pushing down a join even if there are
unpushable join clauses. But the pushable conditions can be put back
into the ON clause. This avoids using subqueries while deparsing.

I'm not sure that is a good idea. As you mentioned, we could support
FULL JOIN fully, by encapsulating a joining relation with conditions
into a subquery. And ISTM that it is relatively easy to do that, by
borrowing some ideas from Hanada-san's original join pushdown patch. If
it's OK, I'll create a patch for that in a few days.

Sorry for speaking up late.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Etsuro Fujita (#5)
Re: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

On Thu, Apr 14, 2016 at 8:42 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:

On 2016/03/29 23:20, Ashutosh Bapat wrote:

I think the reason for that is in foreign_join_ok. This in that
function:

wrongly pulls up remote_conds from joining relations in the FULL
JOIN case. I think we should not pull up such conditions in the
FULL JOIN case.

Right. For a full outer join, since each joining relation acts as outer

for the other, we can not pull up the quals to either join clauses or
other clauses. So, in such a case, we will need to encapsulate the
joining relation with conditions into a subquery. Unfortunately, the
current deparse logic does not handle this encapsulation. Adding that
functionality so close to the feature freeze might be risky given the
amount of code changes required.

PFA patch with a quick fix. A full outer join with either of the joining
relations having WHERE conditions (or other clauses) is not pushed down.
In the particular case that was reported, the bug triggered because of
the way conditions are handled for an inner join. For an inner join, all
the conditions in ON as well as WHERE clause are treated like they are
part of WHERE clause. This allows pushing down a join even if there are
unpushable join clauses. But the pushable conditions can be put back
into the ON clause. This avoids using subqueries while deparsing.

I'm not sure that is a good idea. As you mentioned, we could support FULL
JOIN fully, by encapsulating a joining relation with conditions into a
subquery. And ISTM that it is relatively easy to do that, by borrowing
some ideas from Hanada-san's original join pushdown patch. If it's OK,
I'll create a patch for that in a few days.

In his patch the deparsing targetlist and conditions required that the FROM
clause entries were ready with the columns from base relations and joins
realiased. That's no more true. We deparse every Var node as <relation
alias>.<column name> where relation alias is nothing but rN; N being index
of RangeTblEntry. So, Hanada-san's method to deparse recursively can not be
applied as such now.

Here's what needs to be done:
When we identify that certain relation (base or join) needs a subquery to
be deparsed (because the join relation above it could not pull the quals
up), we remember it in the upper join relation. Before deparsing 1. we walk
the join tree and collect targetlists of all such relations, 2. associate
column aliases with those targetlists (save the column alises in resname?)
and craft a relation alias 3. associate the relations alias, column
aliases and targetlists with the base relations involved in such relations
(may be creating a list similar to rtable). While deparsing a Var node, we
check if corresponding base relation is itself or part of a relation
deparsed as a subquery. If it is then we lookup that Var in the targetlist
associated with the base relation and use corresponding relation and column
alias for deparsing it. Otherwise, deparse it as <relation alias>.<column
name> usually.

That looks like a big code change to go after feature freeze. So, we will
leave it for 9.7, unless RMT allows us to introduce that change.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#7Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#6)
Re: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

On 2016/04/14 15:20, Ashutosh Bapat wrote:

On Thu, Apr 14, 2016 at 8:42 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:

As you mentioned, we could
support FULL JOIN fully, by encapsulating a joining relation with
conditions into a subquery. And ISTM that it is relatively easy to
do that, by borrowing some ideas from Hanada-san's original join
pushdown patch. If it's OK, I'll create a patch for that in a few days.

In his patch the deparsing targetlist and conditions required that the
FROM clause entries were ready with the columns from base relations and
joins realiased. That's no more true. We deparse every Var node as
<relation alias>.<column name> where relation alias is nothing but rN; N
being index of RangeTblEntry. So, Hanada-san's method to deparse
recursively can not be applied as such now.

I think so, too. I don't think his ideas could be applied as is.

Here's what needs to be done:
When we identify that certain relation (base or join) needs a subquery
to be deparsed (because the join relation above it could not pull the
quals up), we remember it in the upper join relation. Before deparsing
1. we walk the join tree and collect targetlists of all such relations,
2. associate column aliases with those targetlists (save the column
alises in resname?) and craft a relation alias 3. associate the
relations alias, column aliases and targetlists with the base relations
involved in such relations (may be creating a list similar to rtable).
While deparsing a Var node, we check if corresponding base relation is
itself or part of a relation deparsed as a subquery. If it is then we
lookup that Var in the targetlist associated with the base relation and
use corresponding relation and column alias for deparsing it. Otherwise,
deparse it as <relation alias>.<column name> usually.

Good to know. That is what I have in mind, except for the way of
collecting subqueries' columns and associating those columns with
relation and column aliases, which I think can be done more easily.
Please find attached a WIP patch. That patch works well at least for
queries in your patch. Maybe I'm missing something, though.

That looks like a big code change to go after feature freeze. So, we
will leave it for 9.7, unless RMT allows us to introduce that change.

OK

Best regards,
Etsuro Fujita

Attachments:

pg_fj_cond_efujita.patchtext/x-patch; name=pg_fj_cond_efujita.patchDownload+308-199
#8Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Bapat (#3)
Re: Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

On Tue, Mar 29, 2016 at 10:20 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I think the reason for that is in foreign_join_ok. This in that function:

wrongly pulls up remote_conds from joining relations in the FULL JOIN
case. I think we should not pull up such conditions in the FULL JOIN case.

Right. For a full outer join, since each joining relation acts as outer for
the other, we can not pull up the quals to either join clauses or other
clauses. So, in such a case, we will need to encapsulate the joining
relation with conditions into a subquery. Unfortunately, the current deparse
logic does not handle this encapsulation. Adding that functionality so close
to the feature freeze might be risky given the amount of code changes
required.

PFA patch with a quick fix. A full outer join with either of the joining
relations having WHERE conditions (or other clauses) is not pushed down. In
the particular case that was reported, the bug triggered because of the way
conditions are handled for an inner join. For an inner join, all the
conditions in ON as well as WHERE clause are treated like they are part of
WHERE clause. This allows pushing down a join even if there are unpushable
join clauses. But the pushable conditions can be put back into the ON
clause. This avoids using subqueries while deparsing.

Committed.

I think we should introduce subquery-based deparsing for 9.7, but I
agree it's better not to do it now. I think we should try to handle
SEMI and ANTI joins that way, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers