Optimization for updating foreign tables in Postgres FDW
Attached is a WIP patch for the following:
/*
* postgresPlanForeignModify
* Plan an insert/update/delete operation on a foreign table
*
* Note: currently, the plan tree generated for UPDATE/DELETE will always
* include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
* and then the ModifyTable node will have to execute individual remote
* UPDATE/DELETE commands. If there are no local conditions or joins
* needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
* and then do nothing at ModifyTable. Room for future optimization ...
*/
In the patch postgresPlanForeignModify has been modified so that if, in
addition to the above condition, the followings are satisfied, then the
ForeignScan and ModifyTable node will work that way.
- There are no local BEFORE/AFTER triggers.
- In UPDATE it's safe to evaluate expressions to assign to the target
columns on the remote server.
Here is a simple performance test.
On remote side:
postgres=# create table t (id serial primary key, inserted timestamp
default clock_timestamp(), data text);
CREATE TABLE
postgres=# insert into t(data) select random() from generate_series(0,
99999);
INSERT 0 100000
postgres=# vacuum t;
VACUUM
On local side:
postgres=# create foreign table ft (id integer, inserted timestamp, data
text) server myserver options (table_name 't');
CREATE FOREIGN TABLE
Unpatched:
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=1275.255..1275.255 rows=0 loops=1)
Remote SQL: DELETE FROM public.t WHERE ctid = $1
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=1.180..52.095 rows=9999 loops=1)
Output: ctid
Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
UPDATE
Planning time: 0.112 ms
Execution time: 1275.733 ms
(7 rows)
Patched (Note that the DELETE command has been pushed down.):
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=0.006..0.006 rows=0 loops=1)
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=0.001..0.001 rows=0 loops=1)
Output: ctid
Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
Planning time: 0.101 ms
Execution time: 8.808 ms
(6 rows)
I'll add this to the next CF. Comments are welcome.
Thanks,
Best regards,
Etsuro Fujita
Attachments:
postgres_fdw-update-v1.patchtext/plain; charset=Shift_JIS; name=postgres_fdw-update-v1.patchDownload+593-128
On Tue, Jul 8, 2014 at 3:07 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
Attached is a WIP patch for the following:
/*
* postgresPlanForeignModify
* Plan an insert/update/delete operation on a foreign table
*
* Note: currently, the plan tree generated for UPDATE/DELETE will always
* include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
* and then the ModifyTable node will have to execute individual remote
* UPDATE/DELETE commands. If there are no local conditions or joins
* needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
* and then do nothing at ModifyTable. Room for future optimization ...
*/In the patch postgresPlanForeignModify has been modified so that if, in
addition to the above condition, the followings are satisfied, then the
ForeignScan and ModifyTable node will work that way.- There are no local BEFORE/AFTER triggers.
- In UPDATE it's safe to evaluate expressions to assign to the target
columns on the remote server.Here is a simple performance test.
On remote side:
postgres=# create table t (id serial primary key, inserted timestamp
default clock_timestamp(), data text);
CREATE TABLE
postgres=# insert into t(data) select random() from generate_series(0,
99999);
INSERT 0 100000
postgres=# vacuum t;
VACUUMOn local side:
postgres=# create foreign table ft (id integer, inserted timestamp, data
text) server myserver options (table_name 't');
CREATE FOREIGN TABLEUnpatched:
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=1275.255..1275.255 rows=0 loops=1)
Remote SQL: DELETE FROM public.t WHERE ctid = $1
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=1.180..52.095 rows=9999 loops=1)
Output: ctid
Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
UPDATE
Planning time: 0.112 ms
Execution time: 1275.733 ms
(7 rows)Patched (Note that the DELETE command has been pushed down.):
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=0.006..0.006 rows=0 loops=1)
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=0.001..0.001 rows=0 loops=1)
Output: ctid
Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
Planning time: 0.101 ms
Execution time: 8.808 ms
(6 rows)I'll add this to the next CF. Comments are welcome.
I haven't looked at the code, but +1 for the general idea. The
concept seems good to me, and that's a very large performance
improvement.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Fujita-san,
My coworker Takaaki EITOKU reviewed the patch, and here are some
comments from him.
2014-07-08 16:07 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
...
In the patch postgresPlanForeignModify has been modified so that if, in
addition to the above condition, the followings are satisfied, then the
ForeignScan and ModifyTable node will work that way.- There are no local BEFORE/AFTER triggers.
The reason the check ignores INSTEAD OF triggers here is that INSTEAD
OF trigger would prevent executing UPDATE/DELETE statement against a
foreign tables at all, right?
- In UPDATE it's safe to evaluate expressions to assign to the target
columns on the remote server.
IIUC, in addition to target expressions, whole WHERE clause should be
safe to be pushed-down. Though that is obviously required, but
mentioning that in documents and source comments would help users and
developers.
Here is a simple performance test.
On remote side:
postgres=# create table t (id serial primary key, inserted timestamp
default clock_timestamp(), data text);
CREATE TABLE
postgres=# insert into t(data) select random() from generate_series(0,
99999);
INSERT 0 100000
postgres=# vacuum t;
VACUUMOn local side:
postgres=# create foreign table ft (id integer, inserted timestamp, data
text) server myserver options (table_name 't');
CREATE FOREIGN TABLEUnpatched:
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=1275.255..1275.255 rows=0 loops=1)
Remote SQL: DELETE FROM public.t WHERE ctid = $1
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=1.180..52.095 rows=9999 loops=1)
Output: ctid
Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
UPDATE
Planning time: 0.112 ms
Execution time: 1275.733 ms
(7 rows)Patched (Note that the DELETE command has been pushed down.):
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=0.006..0.006 rows=0 loops=1)
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=0.001..0.001 rows=0 loops=1)
Output: ctid
Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
Planning time: 0.101 ms
Execution time: 8.808 ms
(6 rows)
We found that this patch speeds up DELETE case remarkably, as you
describe above, but we saw only less than 2x speed on UPDATE cases.
Do you have any numbers of UPDATE cases?
Some more random thoughts:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.
* Macros for # of fdw_private elements
In postgres_fdw.c you defined MaxFdwScanFdwPrivateLength and
MinFdwScanFdwPrivateLength for determining the mode, but number of
fdw_private elements is not a ranged value but an enum value (I mean
that fdw_private holds 3 or 7, not 4~6, so Max and Min seems
inappropriate for prefix. IMO those macros should be named so that
the names represent the behavior, or define a macro to determine the
mode, say IS_SHIPPABLE(foo).
* Comparison of Macros
Comparison against MaxFdwScanFdwPrivateLength and
MinFdwScanFdwPrivateLength should be == instead of >= or <= to detect
unexpected value. Adding assert macro seems good too.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2014/07/24 18:30), Shigeru Hanada wrote:
My coworker Takaaki EITOKU reviewed the patch, and here are some
comments from him.
Thank you for the review, Eitoku-san!
2014-07-08 16:07 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
In the patch postgresPlanForeignModify has been modified so that if, in
addition to the above condition, the followings are satisfied, then the
ForeignScan and ModifyTable node will work that way.- There are no local BEFORE/AFTER triggers.
The reason the check ignores INSTEAD OF triggers here is that INSTEAD
OF trigger would prevent executing UPDATE/DELETE statement against a
foreign tables at all, right?
I'm not sure that I understand your question correctly, but the reason
for that is because foreign tables cannot have INSTEAD OF triggers.
- In UPDATE it's safe to evaluate expressions to assign to the target
columns on the remote server.IIUC, in addition to target expressions, whole WHERE clause should be
safe to be pushed-down. Though that is obviously required, but
mentioning that in documents and source comments would help users and
developers.
OK, I'll add the comments and documentation notes.
(I intentionaly didn't mention that because I think the comment for
postgresPlanForeignModify has already said the equivalent condition, ie,
"there are no local conditions", which means all conditions are executed
remotely.)
We found that this patch speeds up DELETE case remarkably, as you
describe above, but we saw only less than 2x speed on UPDATE cases.
Do you have any numbers of UPDATE cases?
Here is the result for an UPDATE case.
On remote side:
postgres=# create table t (id serial primary key, inserted timestamp
default clock_timestamp(), data text);
CREATE TABLE
postgres=# insert into t(data) select random() from generate_series(0,
99999);
INSERT 0 100000
postgres=# vacuum t;
VACUUMOn local side:
postgres=# create foreign table ft (id integer, inserted timestamp, data
text) server myserver options (table_name 't');
CREATE FOREIGN TABLE
Unpatched:
postgres=# explain analyze verbose update ft set data = 'notsolongtext'
where id < 10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Update on public.ft (cost=100.00..147.38 rows=650 width=18) (actual
time=1463.944..1463.944 rows=0 loops=1)
Remote SQL: UPDATE public.t SET data = $2 WHERE ctid = $1
-> Foreign Scan on public.ft (cost=100.00..147.38 rows=650
width=18) (actual time=2.069..83.220 rows=9999 loops=1)
Output: id, inserted, 'notsolongtext'::text, ctid
Remote SQL: SELECT id, inserted, ctid FROM public.t WHERE ((id
< 10000)) FOR UPDATE
Planning time: 0.355 ms
Execution time: 1470.032 ms
(7 rows)
Patched:
postgres=# explain analyze verbose update ft set data = 'notsolongtext'
where id < 10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Update on public.ft (cost=100.00..147.38 rows=650 width=18) (actual
time=0.005..0.005 rows=0 loops=1)
-> Foreign Scan on public.ft (cost=100.00..147.38 rows=650
width=18) (actual time=0.001..0.001 rows=0 loops=1)
Output: id, inserted, data, ctid
Remote SQL: UPDATE public.t SET data = 'notsolongtext'::text
WHERE ((id < 10000))
Planning time: 0.197 ms
Execution time: 61.519 ms
(6 rows)
I think that the precise effect of this optimization for DELETE/UPDATE
would depend on eg, data, queries (inc. w/ or w/o RETRUNING clauses) and
server/network performance. Could you tell me these information about
the UPDATE evaluation?
Some more random thoughts:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.
+1
* Macros for # of fdw_private elements
In postgres_fdw.c you defined MaxFdwScanFdwPrivateLength and
MinFdwScanFdwPrivateLength for determining the mode, but number of
fdw_private elements is not a ranged value but an enum value (I mean
that fdw_private holds 3 or 7, not 4~6, so Max and Min seems
inappropriate for prefix. IMO those macros should be named so that
the names represent the behavior, or define a macro to determine the
mode, say IS_SHIPPABLE(foo).
OK, Will fix.
* Comparison of Macros
Comparison against MaxFdwScanFdwPrivateLength and
MinFdwScanFdwPrivateLength should be == instead of >= or <= to detect
unexpected value. Adding assert macro seems good too.
Will fix this too.
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Shigeru Hanada wrote:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.
How about "batch foreign update" or "batch foreign modification"?
(Disclaimer: I'm not a native speaker either.)
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Shigeru Hanada wrote:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.How about "batch foreign update" or "batch foreign modification"?
(Disclaimer: I'm not a native speaker either.)
I think direct update sounds pretty good. "Batch" does not sound as
good to me, since it doesn't clearly describe what makes this patch
special as opposed to some other grouping of updates that happens to
produce a speedup.
Another term that might be used is "update pushdown", since we are
pushing the whole update to the remote server instead of having the
local server participate. Without looking at the patch, I don't have
a strong opinion on whether that's better than "direct update" in this
context.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2014/07/29 0:58), Robert Haas wrote:
On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Shigeru Hanada wrote:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.How about "batch foreign update" or "batch foreign modification"?
(Disclaimer: I'm not a native speaker either.)I think direct update sounds pretty good. "Batch" does not sound as
good to me, since it doesn't clearly describe what makes this patch
special as opposed to some other grouping of updates that happens to
produce a speedup.
I agree with Robert on that point.
Another term that might be used is "update pushdown", since we are
pushing the whole update to the remote server instead of having the
local server participate. Without looking at the patch, I don't have
a strong opinion on whether that's better than "direct update" in this
context.
"Update Pushdown" is fine with me.
If there are no objections of others, I'll change the name from "Direct
Update" to "Update Pushdown".
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi
2014-07-30 10:22 GMT+02:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
(2014/07/29 0:58), Robert Haas wrote:
On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:Shigeru Hanada wrote:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.How about "batch foreign update" or "batch foreign modification"?
(Disclaimer: I'm not a native speaker either.)I think direct update sounds pretty good. "Batch" does not sound as
good to me, since it doesn't clearly describe what makes this patch
special as opposed to some other grouping of updates that happens to
produce a speedup.I agree with Robert on that point.
Another term that might be used is "update pushdown", since we are
pushing the whole update to the remote server instead of having the
local server participate. Without looking at the patch, I don't have
a strong opinion on whether that's better than "direct update" in this
context."Update Pushdown" is fine with me.
If there are no objections of others, I'll change the name from "Direct
Update" to "Update Pushdown".
I like "Update Pushdown" - it is simple without other semantic
Regards
Pavel
Show quoted text
Thanks,
Best regards,
Etsuro Fujita--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Fujita-san,
Here is a new review result from Eitoku-san.
2014-07-25 16:30 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
(2014/07/24 18:30), Shigeru Hanada wrote:
I'm not sure that I understand your question correctly, but the reason for
that is because foreign tables cannot have INSTEAD OF triggers.
Now I see the reason, but then I worry (though it unlikely happens) a
case that new trigger type might be added in future. The code says
that "only BEFORE and AFTER triggers are unsafe for direct update",
but it would be more safe to code that "any trigger other than event
trigger is unsafe for direct update".
We found that this patch speeds up DELETE case remarkably, as you
describe above, but we saw only less than 2x speed on UPDATE cases.
Do you have any numbers of UPDATE cases?Here is the result for an UPDATE case.
Hmm, performance gain on UPDATE cases seems similar to our results,
except planning times. In your environment the patch reduces planning
time too, but we got longer planning times with your patch (in only
once in six trial, we got shorter planning time than average of
patched version). Could you try multiple times on your environment?
I think that the precise effect of this optimization for DELETE/UPDATE would
depend on eg, data, queries (inc. w/ or w/o RETRUNING clauses) and
server/network performance. Could you tell me these information about the
UPDATE evaluation?
I tried on a CentOS 6.5 on VMware on a Note PC with Core i3 1.17GHz,
2.0GB memory and single HDD, so the performance is poor.
The SQLs used for performance test are quite simple, update 10
thousands rows at a time, and repeat it for different section of the
table for six times. The definition of foreign table ft is same as
the one in your case.
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >= 0
AND id < 10000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
10000 AND id < 20000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
20000 AND id < 30000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
30000 AND id < 40000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
40000 AND id < 50000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
50000 AND id < 60000;
Some more random thoughts:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.
Update push-down seems nice with according to others.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2014/07/30 17:22), Etsuro Fujita wrote:
(2014/07/29 0:58), Robert Haas wrote:
On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz
<laurenz.albe@wien.gv.at> wrote:Shigeru Hanada wrote:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.How about "batch foreign update" or "batch foreign modification"?
(Disclaimer: I'm not a native speaker either.)I think direct update sounds pretty good. "Batch" does not sound as
good to me, since it doesn't clearly describe what makes this patch
special as opposed to some other grouping of updates that happens to
produce a speedup.I agree with Robert on that point.
Another term that might be used is "update pushdown", since we are
pushing the whole update to the remote server instead of having the
local server participate. Without looking at the patch, I don't have
a strong opinion on whether that's better than "direct update" in this
context."Update Pushdown" is fine with me.
If there are no objections of others, I'll change the name from "Direct
Update" to "Update Pushdown".
Done. (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
though.)
Other changes:
* Address the comments from Eitoku-san.
* Add regression tests.
* Fix a bug, which fails to show the actual row counts in EXPLAIN
ANALYZE for UPDATE/DELETE without a RETURNING clause.
* Rebase to HEAD.
Please find attached an updated version of the patch.
Thanks,
Best regards,
Etsuro Fujita
Attachments:
postgres_fdw-update-v2.patchtext/x-diff; name=postgres_fdw-update-v2.patchDownload+688-132
Hi Hanada-san,
Thank you for the answer.
(2014/08/04 19:36), Shigeru Hanada wrote:
2014-07-25 16:30 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
(2014/07/24 18:30), Shigeru Hanada wrote:
I'm not sure that I understand your question correctly, but the reason for
that is because foreign tables cannot have INSTEAD OF triggers.
Now I see the reason, but then I worry (though it unlikely happens) a
case that new trigger type might be added in future. The code says
that "only BEFORE and AFTER triggers are unsafe for direct update",
but it would be more safe to code that "any trigger other than event
trigger is unsafe for direct update".
Yeah, I've revised the comment for that in the updated version of the
patch I sent in just now. Could you check it?
We found that this patch speeds up DELETE case remarkably, as you
describe above, but we saw only less than 2x speed on UPDATE cases.
Do you have any numbers of UPDATE cases?
Hmm, performance gain on UPDATE cases seems similar to our results,
except planning times. In your environment the patch reduces planning
time too, but we got longer planning times with your patch (in only
once in six trial, we got shorter planning time than average of
patched version). Could you try multiple times on your environment?
No. Is the overhead so large that it cannot be ignored?
I think that the precise effect of this optimization for DELETE/UPDATE would
depend on eg, data, queries (inc. w/ or w/o RETRUNING clauses) and
server/network performance. Could you tell me these information about the
UPDATE evaluation?
I tried on a CentOS 6.5 on VMware on a Note PC with Core i3 1.17GHz,
2.0GB memory and single HDD, so the performance is poor.The SQLs used for performance test are quite simple, update 10
thousands rows at a time, and repeat it for different section of the
table for six times. The definition of foreign table ft is same as
the one in your case.EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >= 0
AND id < 10000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
10000 AND id < 20000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
20000 AND id < 30000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
30000 AND id < 40000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
40000 AND id < 50000;
EXPLAIN ANALYZE VERBOSE UPDATe ft SET data = 'abcdefg' WHERE id >=
50000 AND id < 60000;
OK I also will evaluate the performance under the same workloads.
Some more random thoughts:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.
Update push-down seems nice with according to others.
The name has been changed in the updated version.
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Fujita-san,
Issues addressed by Eitoku-san were fixed properly, but he found a bug
and a possible enhancement in the v2 patch.
* push-down check misses delete triggers
update_is_pushdown_safe() seems to have a bug that it misses the
existence of row-level delete trigger. DELETE statement executed
against a foreign table which has row-level delete trigger is pushed
down to remote, and consequently no row-level delete trigger is fired.
* further optimization
Is there any chance to consider further optimization by passing the
operation type (UPDATE|DELETE) of undergoing statement to
update_is_pushdown_safe()? It seems safe to push down UPDATE
statement when the target foreign table has no update trigger even it
has a delete trigger (of course the opposite combination would be also
fine).
* Documentation
The requirement of pushing down UPDATE/DELETE statements would not be
easy to understand for non-expert users, so it seems that there is a
room to enhance documentation. An idea is to define which expression
is safe to send to remote first (it might need to mention the
difference of semantics), and refer the definition from the place
describing the requirement of pushing-down for SELECT, UPDATE and
DELETE.
2014-08-04 20:30 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
(2014/07/30 17:22), Etsuro Fujita wrote:
(2014/07/29 0:58), Robert Haas wrote:
On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz
<laurenz.albe@wien.gv.at> wrote:Shigeru Hanada wrote:
* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.How about "batch foreign update" or "batch foreign modification"?
(Disclaimer: I'm not a native speaker either.)I think direct update sounds pretty good. "Batch" does not sound as
good to me, since it doesn't clearly describe what makes this patch
special as opposed to some other grouping of updates that happens to
produce a speedup.I agree with Robert on that point.
Another term that might be used is "update pushdown", since we are
pushing the whole update to the remote server instead of having the
local server participate. Without looking at the patch, I don't have
a strong opinion on whether that's better than "direct update" in this
context."Update Pushdown" is fine with me.
If there are no objections of others, I'll change the name from "Direct
Update" to "Update Pushdown".Done. (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
though.)Other changes:
* Address the comments from Eitoku-san.
* Add regression tests.
* Fix a bug, which fails to show the actual row counts in EXPLAIN ANALYZE
for UPDATE/DELETE without a RETURNING clause.
* Rebase to HEAD.Please find attached an updated version of the patch.
Thanks,
Best regards,
Etsuro Fujita
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2014/08/12 18:34), Shigeru Hanada wrote:
Issues addressed by Eitoku-san were fixed properly, but he found a bug
and a possible enhancement in the v2 patch.
Thank you for the review, Hanada-san and Eitoku-san!
* push-down check misses delete triggers
update_is_pushdown_safe() seems to have a bug that it misses the
existence of row-level delete trigger. DELETE statement executed
against a foreign table which has row-level delete trigger is pushed
down to remote, and consequently no row-level delete trigger is fired.
Ah, I noticed that the current code for that is not correct. Will fix.
* further optimization
Is there any chance to consider further optimization by passing the
operation type (UPDATE|DELETE) of undergoing statement to
update_is_pushdown_safe()? It seems safe to push down UPDATE
statement when the target foreign table has no update trigger even it
has a delete trigger (of course the opposite combination would be also
fine).
Good idea! Will improve that too.
* Documentation
The requirement of pushing down UPDATE/DELETE statements would not be
easy to understand for non-expert users, so it seems that there is a
room to enhance documentation. An idea is to define which expression
is safe to send to remote first (it might need to mention the
difference of semantics), and refer the definition from the place
describing the requirement of pushing-down for SELECT, UPDATE and
DELETE.
Yeah, I also think that it would not necessarily easy for the users to
understand which expression is safe to send. So I agree with that
enhancement, but ISTM that it would be better to do that as a separate
patch.
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Etsuro Fujita wrote:
Done. (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
though.)Other changes:
* Address the comments from Eitoku-san.
* Add regression tests.
* Fix a bug, which fails to show the actual row counts in EXPLAIN
ANALYZE for UPDATE/DELETE without a RETURNING clause.
* Rebase to HEAD.Please find attached an updated version of the patch.
Here is my review:
The patch Applies fine, Builds without warning and passes make Check,
so the ABC of patch reviewing is fine.
I played with it, and apart from Hanada's comments I have found the following:
test=> EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id > 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Update on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.005..0.005 rows=0 loops=1)
-> Foreign Scan on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.002..0.002 rows=299997 loops=1)
Output: id, val, ctid
Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id > 3))
Planning time: 0.179 ms
Execution time: 3706.919 ms
(6 rows)
Time: 3708.272 ms
The "actual time" readings are surprising.
Shouldn't these similar to the actual execution time, since most of the time is spent
in the foreign scan node?
Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the normal scan?
It is not expected that postgresReScanForeignScan is called when the UPDATE/DELETE
is pushed down, right? Maybe it would make sense to add an assertion for that.
I ran a simple performance test and found that performance is improved as expected;
updating 100000 rows took 1000 rather than 8000 ms, and DELETING the same amount
took 200 instead of 6500 ms.
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2014/08/25 21:58), Albe Laurenz wrote:
Here is my review:
Thank you for the review!
I played with it, and apart from Hanada's comments I have found the following:
test=> EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id > 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Update on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.005..0.005 rows=0 loops=1)
-> Foreign Scan on laurenz.rtest (cost=100.00..14134.40 rows=299970 width=10) (actual time=0.002..0.002 rows=299997 loops=1)
Output: id, val, ctid
Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id > 3))
Planning time: 0.179 ms
Execution time: 3706.919 ms
(6 rows)Time: 3708.272 ms
The "actual time" readings are surprising.
Shouldn't these similar to the actual execution time, since most of the time is spent
in the foreign scan node?
I was also thinkng that this is confusing to the users. I think this is
because the patch executes the UPDATE/DELETE statement during
postgresBeginForeignScan, not postgresIterateForeignScan, as you
mentioned below:
Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the normal scan?
I'll modify the patch so as to execute the statement during
postgresIterateForeignScan.
It is not expected that postgresReScanForeignScan is called when the UPDATE/DELETE
is pushed down, right? Maybe it would make sense to add an assertion for that.
IIUC, that is right. As ModifyTable doesn't support rescan currently,
postgresReScanForeignScan needn't to be called in the update pushdown
case. The assertion is a good idea. I'll add it.
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 25, 2014 at 8:58 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the normal scan?
Hmm, I'm worried that may be an API contract violation. ISTM that we
might initialize nodes that we never read from - they can show up in
the EXPLAIN-plan as (never executed) - and things that aren't executed
shouldn't do work, especially work that permanently modifies data.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Aug 25, 2014 at 8:58 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the normal scan?
Hmm, I'm worried that may be an API contract violation.
Indeed it is. You could get away with it if you check the
EXEC_FLAG_EXPLAIN_ONLY flag before doing anything with visible
side-effects, but it's still pretty ugly.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Hmm, I'm worried that may be an API contract violation.
Indeed it is. You could get away with it if you check the
EXEC_FLAG_EXPLAIN_ONLY flag before doing anything with visible
side-effects, but it's still pretty ugly.
Actually, there's another problem there. What of UPDATE or DELETE with a
LIMIT clause, which is something that seems to be coming down the pike:
https://commitfest.postgresql.org/action/patch_view?id=1550
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2014/08/27 22:56), Robert Haas wrote:
On Mon, Aug 25, 2014 at 8:58 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Reading the code, I noticed that the pushed down UPDATE or DELETE statement is executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the normal scan?Hmm, I'm worried that may be an API contract violation.
Will fix.
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2014/08/27 23:05), Tom Lane wrote:
I wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Hmm, I'm worried that may be an API contract violation.
Actually, there's another problem there. What of UPDATE or DELETE with a
LIMIT clause, which is something that seems to be coming down the pike:
https://commitfest.postgresql.org/action/patch_view?id=1550
I'd like to try to extend the functionality so as to push
UPDATE/DELETE-with-LIMIT down into the remote server if it's safe. But
I don't yet know if it's possible, because I started looking into the
UPDATE/DELETE-with-LIMIT patch just now ...
Thanks,
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers