Optimization for updating foreign tables in Postgres FDW

Started by Etsuro Fujitaalmost 12 years ago174 messageshackers

fujita.etsuro@lab.ntt.co.jp

almost 12 years ago

Attached is a WIP patch for the following:

/*
* postgresPlanForeignModify
* Plan an insert/update/delete operation on a foreign table
*
* Note: currently, the plan tree generated for UPDATE/DELETE will always
* include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
* and then the ModifyTable node will have to execute individual remote
* UPDATE/DELETE commands. If there are no local conditions or joins
* needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
* and then do nothing at ModifyTable. Room for future optimization ...
*/

In the patch postgresPlanForeignModify has been modified so that if, in
addition to the above condition, the followings are satisfied, then the
ForeignScan and ModifyTable node will work that way.

- There are no local BEFORE/AFTER triggers.
- In UPDATE it's safe to evaluate expressions to assign to the target
columns on the remote server.

Here is a simple performance test.

On remote side:
postgres=# create table t (id serial primary key, inserted timestamp
default clock_timestamp(), data text);
CREATE TABLE
postgres=# insert into t(data) select random() from generate_series(0,
99999);
INSERT 0 100000
postgres=# vacuum t;
VACUUM

On local side:
postgres=# create foreign table ft (id integer, inserted timestamp, data
text) server myserver options (table_name 't');
CREATE FOREIGN TABLE

Unpatched:
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=1275.255..1275.255 rows=0 loops=1)
Remote SQL: DELETE FROM public.t WHERE ctid = $1
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=1.180..52.095 rows=9999 loops=1)
Output: ctid
Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
UPDATE
Planning time: 0.112 ms
Execution time: 1275.733 ms
(7 rows)

Patched (Note that the DELETE command has been pushed down.):
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=0.006..0.006 rows=0 loops=1)
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=0.001..0.001 rows=0 loops=1)
Output: ctid
Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
Planning time: 0.101 ms
Execution time: 8.808 ms
(6 rows)

I'll add this to the next CF. Comments are welcome.

Thanks,

Best regards,
Etsuro Fujita

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Etsuro Fujita (#1)

Re: Optimization for updating foreign tables in Postgres FDW

On Tue, Jul 8, 2014 at 3:07 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Attached is a WIP patch for the following:

/*
* postgresPlanForeignModify
* Plan an insert/update/delete operation on a foreign table
*
* Note: currently, the plan tree generated for UPDATE/DELETE will always
* include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
* and then the ModifyTable node will have to execute individual remote
* UPDATE/DELETE commands. If there are no local conditions or joins
* needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
* and then do nothing at ModifyTable. Room for future optimization ...
*/

In the patch postgresPlanForeignModify has been modified so that if, in
addition to the above condition, the followings are satisfied, then the
ForeignScan and ModifyTable node will work that way.

- There are no local BEFORE/AFTER triggers.
- In UPDATE it's safe to evaluate expressions to assign to the target
columns on the remote server.

Here is a simple performance test.

On remote side:
postgres=# create table t (id serial primary key, inserted timestamp
default clock_timestamp(), data text);
CREATE TABLE
postgres=# insert into t(data) select random() from generate_series(0,
99999);
INSERT 0 100000
postgres=# vacuum t;
VACUUM

On local side:
postgres=# create foreign table ft (id integer, inserted timestamp, data
text) server myserver options (table_name 't');
CREATE FOREIGN TABLE

Unpatched:
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=1275.255..1275.255 rows=0 loops=1)
Remote SQL: DELETE FROM public.t WHERE ctid = $1
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=1.180..52.095 rows=9999 loops=1)
Output: ctid
Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
UPDATE
Planning time: 0.112 ms
Execution time: 1275.733 ms
(7 rows)

Patched (Note that the DELETE command has been pushed down.):
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=0.006..0.006 rows=0 loops=1)
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=0.001..0.001 rows=0 loops=1)
Output: ctid
Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
Planning time: 0.101 ms
Execution time: 8.808 ms
(6 rows)

I'll add this to the next CF. Comments are welcome.

I haven't looked at the code, but +1 for the general idea. The
concept seems good to me, and that's a very large performance
improvement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Shigeru Hanada

shigeru.hanada@gmail.com

almost 12 years ago

In reply to: Etsuro Fujita (#1)

Re: Optimization for updating foreign tables in Postgres FDW

Hi Fujita-san,

My coworker Takaaki EITOKU reviewed the patch, and here are some
comments from him.

2014-07-08 16:07 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
...

In the patch postgresPlanForeignModify has been modified so that if, in
addition to the above condition, the followings are satisfied, then the
ForeignScan and ModifyTable node will work that way.

- There are no local BEFORE/AFTER triggers.

The reason the check ignores INSTEAD OF triggers here is that INSTEAD
OF trigger would prevent executing UPDATE/DELETE statement against a
foreign tables at all, right?

- In UPDATE it's safe to evaluate expressions to assign to the target
columns on the remote server.

IIUC, in addition to target expressions, whole WHERE clause should be
safe to be pushed-down. Though that is obviously required, but
mentioning that in documents and source comments would help users and
developers.

Here is a simple performance test.

On remote side:
postgres=# create table t (id serial primary key, inserted timestamp
default clock_timestamp(), data text);
CREATE TABLE
postgres=# insert into t(data) select random() from generate_series(0,
99999);
INSERT 0 100000
postgres=# vacuum t;
VACUUM

On local side:
postgres=# create foreign table ft (id integer, inserted timestamp, data
text) server myserver options (table_name 't');
CREATE FOREIGN TABLE

Unpatched:
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=1275.255..1275.255 rows=0 loops=1)
Remote SQL: DELETE FROM public.t WHERE ctid = $1
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=1.180..52.095 rows=9999 loops=1)
Output: ctid
Remote SQL: SELECT ctid FROM public.t WHERE ((id < 10000)) FOR
UPDATE
Planning time: 0.112 ms
Execution time: 1275.733 ms
(7 rows)

Patched (Note that the DELETE command has been pushed down.):
postgres=# explain analyze verbose delete from ft where id < 10000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Delete on public.ft (cost=100.00..162.32 rows=910 width=6) (actual
time=0.006..0.006 rows=0 loops=1)
-> Foreign Scan on public.ft (cost=100.00..162.32 rows=910 width=6)
(actual time=0.001..0.001 rows=0 loops=1)
Output: ctid
Remote SQL: DELETE FROM public.t WHERE ((id < 10000))
Planning time: 0.101 ms
Execution time: 8.808 ms
(6 rows)

We found that this patch speeds up DELETE case remarkably, as you
describe above, but we saw only less than 2x speed on UPDATE cases.
Do you have any numbers of UPDATE cases?

Some more random thoughts:

* Naming of new behavior
You named this optimization "Direct Update", but I'm not sure that
this is intuitive enough to express this behavior. I would like to
hear opinions of native speakers.

* Macros for # of fdw_private elements
In postgres_fdw.c you defined MaxFdwScanFdwPrivateLength and
MinFdwScanFdwPrivateLength for determining the mode, but number of
fdw_private elements is not a ranged value but an enum value (I mean
that fdw_private holds 3 or 7, not 4~6, so Max and Min seems
inappropriate for prefix. IMO those macros should be named so that
the names represent the behavior, or define a macro to determine the
mode, say IS_SHIPPABLE(foo).

* Comparison of Macros
Comparison against MaxFdwScanFdwPrivateLength and
MinFdwScanFdwPrivateLength should be == instead of >= or <= to detect
unexpected value. Adding assert macro seems good too.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Optimization for updating foreign tables in Postgres FDW

Attachments:

Attachments: