postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly
(2017/04/08 10:28), Robert Haas wrote:
On Wed, Mar 22, 2017 at 6:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:On 2017/02/22 19:57, Rushabh Lathia wrote:
Marked this as Ready for Committer.
I noticed that this item in the CF app was incorrectly marked as
Committed.
This patch isn't committed, so I returned it to the previous status.
I also
rebased the patch. Attached is a new version of the patch.
Sorry, I marked the wrong patch as committed. Apologies for that.
No problem. My apologies for the long long delay.
This doesn't apply any more because of recent changes.
git diff --check complains:
contrib/postgres_fdw/postgres_fdw.c:3653: space before tab in indent.
I rebased the patch.
+ /* Shouldn't contain the target relation. */ + Assert(target_rel == 0);This comment should give a reason.
Done.
void
deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
+ RelOptInfo *foreignrel,
List *targetlist,
List *targetAttrs,
List *remote_conds,Could you add a comment explaining the meaning of these various
arguments? It takes rtindex, rel, and foreignrel, which apparently
are all different things, but the meaning is not explained.
Done.
/* + * Add a RETURNING clause, if needed, to an UPDATE/DELETE on a join. + */ +static void +deparseExplicitReturningList(List *rlist, + List **retrieved_attrs, + deparse_expr_cxt *context) +{ + deparseExplicitTargetList(rlist, true, retrieved_attrs, context); +}Do we really want to add a function for one line of code?
I don't have any strong opinion about that, so I removed that function
and modified deparseDirectUpdateSql/deparseDirectDeleteSql so it calls
deparseExplicitTargetList directly.
+/* + * Look for conditions mentioning the target relation in the given
join tree,
+ * which will be pulled up into the WHERE clause. Note that this is
safe due
+ * to the same reason stated in comments in deparseFromExprForRel. + */The comments for deparseFromExprForRel do not seem to address the
topic of why this is safe. Also, the answer to the question "safe
from what?" is not clear.
Maybe my explanation would not be enough, but I think the reason why
this is safe would be derived from the comments for
deparseFromExprForRel. BUT: on reflection, I think I made the deparsing
logic complex beyond necessity. So I simplified the logic, which
doesn't pull_up_target_conditions any more, and added comments about that.
- deparseReturningList(buf, root, rtindex, rel, false, - returningList, retrieved_attrs); + if (foreignrel->reloptkind == RELOPT_JOINREL) + deparseExplicitReturningList(returningList,
retrieved_attrs,&context);
+ else + deparseReturningList(buf, root, rtindex, rel, false, + returningList, retrieved_attrs);Why do these cases need to be handled differently? Maybe add a brief
comment?
The reason for that is 1)
+ /*
+ * When performing an UPDATE/DELETE .. RETURNING on a join directly,
+ * we fetch from the foreign server any Vars specified in RETURNING
+ * that refer not only to the target relation but to non-target
+ * relations. So we'll deparse them into the RETURNING clause
of the
+ * remote query;
and 2) deparseReturningList can retrieve Vars of the target relation,
but can't retrieve Vars of non-target relations.
+ if ((outerrel->reloptkind == RELOPT_BASEREL&& + outerrel->relid == target_rel) || + (innerrel->reloptkind == RELOPT_BASEREL&& + innerrel->relid == target_rel))1. Surely it's redundant to check the RelOptKind if the RTI matches?
Good catch! Revised.
2. Generally, the tests in this patch against various RelOptKind
values should be adapted to use the new macros introduced in
7a39b5e4d11229ece930a51fd7cb29e535db4494.
I left some of the tests alone because I think that's more strict.
The regression tests remove every remaining case where an update or
delete gets fails to get pushed to the remote side. I think we should
still test that path, because we've still got that code. Maybe use a
non-pushable function in the join clause, or something.
Done.
The new test cases could use some brief comments explaining their purpose.
Done. Also, I think some of the tests in the previous version are
redundant, so I simplified the tests a bit.
if (plan->returningLists)
+ {
returningList = (List *) list_nth(plan->returningLists,
subplan_index);
+ /* + * If UPDATE/DELETE on a join, create a RETURNING list used
in the
+ * remote query. + */ + if (fscan->scan.scanrelid == 0) + returningList = make_explicit_returning_list(resultRelation, + rel, + returningList); + }Again, the comment doesn't really explain why we're doing this. And
initializing returningList twice in a row seems strange, too.
I don't think that's strange because the second one is actually
re-computation of that list. Note that make_explicit_returning_list
takes that list as the 3rd argument. I added more comments explaining
why. (I also changed the name of make_explicit_returning_list.)
I am unfortunately too tired to finish properly reviewing this
tonight. :-(
Thanks for the review!
Attached is an updated version of the patch. I'll add this to the next CF.
Sorry for creating a new thread. I changed my mail client.
Best regards,
Etsuro Fujita
Attachments:
postgres-fdw-more-update-pushdown-v5.patchtext/x-diff; name=postgres-fdw-more-update-pushdown-v5.patchDownload+937-269
(2017/12/27 20:55), Etsuro Fujita wrote:
Attached is an updated version of the patch.
I revised code/comments a little bit. PFA new version.
Best regards,
Etsuro Fujita
Attachments:
postgres-fdw-more-update-pushdown-v6.patchtext/x-diff; name=postgres-fdw-more-update-pushdown-v6.patchDownload+943-275
On Thu, Jan 11, 2018 at 2:58 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
(2017/12/27 20:55), Etsuro Fujita wrote:
Attached is an updated version of the patch.
I revised code/comments a little bit. PFA new version.
I spent a while reading through this today. I see a few decisions
here or there that are debatable, in the sense that somebody else
might have chosen to do it differently, but I don't see anything that
actually looks wrong. So, committed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I spent a while reading through this today. I see a few decisions
here or there that are debatable, in the sense that somebody else
might have chosen to do it differently, but I don't see anything that
actually looks wrong. So, committed.
The buildfarm's opinion of it is lower than yours. Just eyeballing
the failures, I'd say there was some naivete about the reproducibility
of tuple CTIDs across different platforms. Is there a good reason
these test cases need to print CTID?
regards, tom lane
On Wed, Feb 7, 2018 at 6:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I spent a while reading through this today. I see a few decisions
here or there that are debatable, in the sense that somebody else
might have chosen to do it differently, but I don't see anything that
actually looks wrong. So, committed.The buildfarm's opinion of it is lower than yours. Just eyeballing
the failures, I'd say there was some naivete about the reproducibility
of tuple CTIDs across different platforms. Is there a good reason
these test cases need to print CTID?
Uggh, I missed the fact that they were doing that. It's probably
actually useful test coverage, but it's not surprising that it isn't
stable.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
(2018/02/08 5:39), Robert Haas wrote:
On Thu, Jan 11, 2018 at 2:58 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:(2017/12/27 20:55), Etsuro Fujita wrote:
Attached is an updated version of the patch.
I revised code/comments a little bit. PFA new version.
I spent a while reading through this today. I see a few decisions
here or there that are debatable, in the sense that somebody else
might have chosen to do it differently, but I don't see anything that
actually looks wrong. So, committed.
Thanks for committing, Robert! Thanks for reviewing, Rushabh and Ashutosh!
Best regards,
Etsuro Fujita
(2018/02/08 10:40), Robert Haas wrote:
On Wed, Feb 7, 2018 at 6:01 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
The buildfarm's opinion of it is lower than yours. Just eyeballing
the failures, I'd say there was some naivete about the reproducibility
of tuple CTIDs across different platforms. Is there a good reason
these test cases need to print CTID?Uggh, I missed the fact that they were doing that. It's probably
actually useful test coverage, but it's not surprising that it isn't
stable.
That was my purpose, but I agree with the instability. Thanks again,
Robert!
Best regards,
Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
(2018/02/08 10:40), Robert Haas wrote:
Uggh, I missed the fact that they were doing that. It's probably
actually useful test coverage, but it's not surprising that it isn't
stable.
That was my purpose, but I agree with the instability. Thanks again,
Robert!
According to
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01
there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.
regards, tom lane
On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
(2018/02/08 10:40), Robert Haas wrote:
Uggh, I missed the fact that they were doing that. It's probably
actually useful test coverage, but it's not surprising that it isn't
stable.That was my purpose, but I agree with the instability. Thanks again,
Robert!According to
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01
there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.
Hmm. Maybe inserting an ANALYZE command in the right place would fix it?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
(2018/02/09 4:32), Robert Haas wrote:
On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
According to
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-02-08%2001%3A45%3A01
there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.
Will look into this.
Hmm. Maybe inserting an ANALYZE command in the right place would fix it?
VACUUM to the right tables in the right place might better fix it?
Another idea would be to modify test cases added by that commit so that
they don't modify the existing tables to not break the existing test cases?
Best regards,
Etsuro Fujita
(2018/02/09 10:48), Etsuro Fujita wrote:
(2018/02/09 4:32), Robert Haas wrote:
On Thu, Feb 8, 2018 at 11:05 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
there's still an intermittent issue. I ran "make installcheck" in
contrib/postgres_fdw in a loop, and got a similar failure on the
47th try --- my result duplicates the second plan change shown by
rhinoceros, but not the first one. I speculate that the plan change
is a result of autovacuum kicking in partway through the run.Will look into this.
I tried to reproduce that in my environment, but I couldn't. On
reflection I think an easy and reliable way to address that concern is
to use local stats on foreign tables. Attached is a patch for that.
Best regards,
Etsuro Fujita
Attachments:
postgres-fdw-regress.patchtext/x-diff; name=postgres-fdw-regress.patchDownload+52-36
On Fri, Feb 9, 2018 at 5:24 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
I tried to reproduce that in my environment, but I couldn't. On reflection
I think an easy and reliable way to address that concern is to use local
stats on foreign tables. Attached is a patch for that.
Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?
I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep. The attached is enough to reproduce rhinoceros' results
for me.
regards, tom lane
Attachments:
add-sleep.patchtext/x-diff; charset=us-ascii; name=add-sleep.patchDownload+1-0
On Fri, Feb 9, 2018 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep. The attached is enough to reproduce rhinoceros' results
for me.
Not for me, but when I pushed the pg_sleep up to 180 seconds, then it failed.
With the proposed patch, it passes repeatedly for me with no sleep,
and also passes for me with the sleep. So I guess I'll commit this
and see what the buildfarm thinks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Feb 9, 2018 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?
I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep. The attached is enough to reproduce rhinoceros' results
for me.
Not for me, but when I pushed the pg_sleep up to 180 seconds, then it failed.
With the proposed patch, it passes repeatedly for me with no sleep,
and also passes for me with the sleep. So I guess I'll commit this
and see what the buildfarm thinks.
FWIW, I ran a thousand cycles of postgres_fdw installcheck without seeing
further problems. So this fixes it at least for my configuration.
However, jaguarundi still shows a problem:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundi&dt=2018-02-10%2008%3A41%3A32
(previous run similar, so it's semi-reproducible even after this patch).
jaguarundi uses -DCLOBBER_CACHE_ALWAYS, so you might try a few repetitions
with that.
regards, tom lane
2018-02-11 6:24 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Feb 9, 2018 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Me neither. I just ran the postgres_fdw regression tests 713 times in
a row without a failure. Tom, since you seem to be able to reproduce
the problem locally, could you have a look at this proposed fix?I'm a bit busy, but AFAICS it's just a timing thing, so try inserting
a sleep. The attached is enough to reproduce rhinoceros' results
for me.Not for me, but when I pushed the pg_sleep up to 180 seconds, then it
failed.
With the proposed patch, it passes repeatedly for me with no sleep,
and also passes for me with the sleep. So I guess I'll commit this
and see what the buildfarm thinks.FWIW, I ran a thousand cycles of postgres_fdw installcheck without seeing
further problems. So this fixes it at least for my configuration.
Thank you both for working on this issue!
However, jaguarundi still shows a problem:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
jaguarundi&dt=2018-02-10%2008%3A41%3A32(previous run similar, so it's semi-reproducible even after this patch).
jaguarundi uses -DCLOBBER_CACHE_ALWAYS, so you might try a few repetitions
with that.
I'll look into this and send a patch by Tuesday.
Best regards,
Etsuro Fujita
(2018/02/11 6:24), Tom Lane wrote:
However, jaguarundi still shows a problem:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundi&dt=2018-02-10%2008%3A41%3A32
(previous run similar, so it's semi-reproducible even after this patch).
jaguarundi uses -DCLOBBER_CACHE_ALWAYS, so you might try a few repetitions
with that.
I ran the postgres_fdw regression test with no sleep two times in a
CLOBBER_CACHE_ALWAYS-enabled build, and then the regression test with
the sleep (60 seconds) two times, but I couldn't reproduce that in both
cases. I suspect the changes in the order of the RETURNING output there
was still caused by autovacuum kicking in partway through the run. So
to make the regression test more stable against autovacuum, I'd propose
to modify the regression test to disable autovacuum for the remote table
(ie "S 1"."T 1") (and perform VACUUM ANALYZE to that table manually
instead) in hopes of getting that fixed. Attached is a patch for that.
I think changes added by the previous patch wouldn't make sense
anymore, so I removed those changes.
Best regards,
Etsuro Fujita
Attachments:
postgres-fdw-regress-2.patchtext/x-diff; name=postgres-fdw-regress-2.patchDownload+42-49
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
(2018/02/11 6:24), Tom Lane wrote:
However, jaguarundi still shows a problem:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundi&dt=2018-02-10%2008%3A41%3A32
I ran the postgres_fdw regression test with no sleep two times in a
CLOBBER_CACHE_ALWAYS-enabled build, and then the regression test with
the sleep (60 seconds) two times, but I couldn't reproduce that in both
cases. I suspect the changes in the order of the RETURNING output there
was still caused by autovacuum kicking in partway through the run. So
to make the regression test more stable against autovacuum, I'd propose
to modify the regression test to disable autovacuum for the remote table
(ie "S 1"."T 1") (and perform VACUUM ANALYZE to that table manually
instead) in hopes of getting that fixed. Attached is a patch for that.
I think changes added by the previous patch wouldn't make sense
anymore, so I removed those changes.
Ping? We're still seeing those failures on jaguarundi.
regards, tom lane
On Tue, Feb 27, 2018 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I ran the postgres_fdw regression test with no sleep two times in a
CLOBBER_CACHE_ALWAYS-enabled build, and then the regression test with
the sleep (60 seconds) two times, but I couldn't reproduce that in both
cases. I suspect the changes in the order of the RETURNING output there
was still caused by autovacuum kicking in partway through the run. So
to make the regression test more stable against autovacuum, I'd propose
to modify the regression test to disable autovacuum for the remote table
(ie "S 1"."T 1") (and perform VACUUM ANALYZE to that table manually
instead) in hopes of getting that fixed. Attached is a patch for that.
I think changes added by the previous patch wouldn't make sense
anymore, so I removed those changes.Ping? We're still seeing those failures on jaguarundi.
This is another patch that got past me. Committed now.
I'd be lying if I said I was filled with confidence that this was
going to fix it, but I don't know what to do other than keep tinkering
with it until we find something that works.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Feb 27, 2018 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ping? We're still seeing those failures on jaguarundi.
This is another patch that got past me. Committed now.
I'd be lying if I said I was filled with confidence that this was
going to fix it, but I don't know what to do other than keep tinkering
with it until we find something that works.
The buildfarm news on this isn't very good; jaguarundi seems happier,
but we've seen intermittent failures on four other critters since
this went in.
The idea of disabling autovacuum seems reasonable to me, but perhaps
it needs to be done to more of the tables.
regards, tom lane