[v9.3] writable foreign tables

Started by KaiGai Koheiover 13 years ago102 messageshackers
Jump to latest
#1KaiGai Kohei
kaigai@ak.jp.nec.com

Hello,

The attached patch is just a proof-of-concept of writable foreign table
feature; thus, I don't expect it getting merged at the upcoming commit
fest. The purpose of this patch is to find out the best way to support
"write stuff" in FDW.

Basic idea of this patch is to utilize "ctid" field to track an identifier of
a particular foreign-row; to be updated or deleted. It shall be moved
to the modify-stage from the scan-stage as regular table doing.
Then, newly added methods being invoked at ExecUpdate or
ExecDelete with the "pseudo ctid", so FDW shall be able to modify
the target foreign-row.
It is a responsibility of FDW extension (and DBA) to ensure each
foreign-row has a unique identifier that has 48-bits width integer
data type in maximum. In case of pgsql_fdw, "ctid" of remote table
can perform as "pseudo ctid", as is. For other RDBMS, DBA will
need to indicate which column should perform.
INSERT is simple enough; all we need to do it to carry every field
of new tuple into the remote side.

This patch adds five new methods of FdwRoutine structure.
The BeginForeignModify and EndForeignModify give a chance
to initialize / destruct a private state that can be allocated on
ResultRelInfo. As literal, ExecForeignInsert, ExecForeignDelete
and ExecForeignUpdate are invoked for each new tuple, instead
of heap_*() for regular tables. If NULL was set on them, it means
this FDW does not support these operations.

I intend FDW to set up a prepared statement that modifies
a particular remote-row being identified with pseudo-ctid,
at the BeginForeignModify(). Then, ExecForeign*() kicks
the prepared statement with given pseudo-ctid.

The patched portion at contrib/file_fdw.c does not make sense
actually. It just prints messages for each invocation.
It is just a proof-of-concept to show possibility of implementation
based on real RDBMS.

In case when file_fdw performs behalf on "ftbl".
--------------------------------
postgres=# SELECT ctid, * FROM ftbl;
ctid | a | b
--------+-----+-----
(0,1) | 101 | aaa
(0,2) | 102 | bbb
(0,3) | 103 | ccc
(0,4) | 104 | ddd
(0,5) | 105 | eee
(0,6) | 106 | fff
(0,7) | 107 | ggg
(0,8) | 108 | hhh
(0,9) | 109 | iii
(0,10) | 110 | jjj
(10 rows)
==> ctid is used to carray identifier of row; line number in this example.

postgres=# UPDATE ftbl SET a = a + 1 WHERE a > 107;
INFO: ftbl is the target relation of UPDATE
INFO: fdw_file: BeginForeignModify method
INFO: fdw_file: UPDATE (lineno = 8)
INFO: fdw_file: UPDATE (lineno = 9)
INFO: fdw_file: UPDATE (lineno = 10)
INFO: fdw_file: EndForeignModify method
UPDATE 3
postgres=# DELETE FROM ftbl WHERE a BETWEEN 103 AND 106;
INFO: ftbl is the target relation of DELETE
INFO: fdw_file: BeginForeignModify method
INFO: fdw_file: DELETE (lineno = 3)
INFO: fdw_file: DELETE (lineno = 4)
INFO: fdw_file: DELETE (lineno = 5)
INFO: fdw_file: DELETE (lineno = 6)
INFO: fdw_file: EndForeignModify method
DELETE 4
--------------------------------

This patch does not care about transaction control anyway.
According to the discussion in developer meeting at Ottawa,
I didn't include such a complex stuff in the first step.
(Probably, we can implement using XactCallback...)

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.3-writable-fdw-poc.v1.patchapplication/octet-stream; name=pgsql-v9.3-writable-fdw-poc.v1.patchDownload+237-10
#2Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#1)
Re: [v9.3] writable foreign tables

On Thu, Aug 23, 2012 at 1:10 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

It is a responsibility of FDW extension (and DBA) to ensure each
foreign-row has a unique identifier that has 48-bits width integer
data type in maximum.

It strikes me as incredibly short-sighted to decide that the row
identifier has to have the same format as what our existing heap AM
happens to have. I think we need to allow the row identifier to be of
any data type, and even compound. For example, the foreign side might
have no equivalent of CTID, and thus use primary key. And the primary
key might consist of an integer and a string, or some such.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#2)
Re: [v9.3] writable foreign tables

2012/8/25 Robert Haas <robertmhaas@gmail.com>:

On Thu, Aug 23, 2012 at 1:10 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

It is a responsibility of FDW extension (and DBA) to ensure each
foreign-row has a unique identifier that has 48-bits width integer
data type in maximum.

It strikes me as incredibly short-sighted to decide that the row
identifier has to have the same format as what our existing heap AM
happens to have. I think we need to allow the row identifier to be of
any data type, and even compound. For example, the foreign side might
have no equivalent of CTID, and thus use primary key. And the primary
key might consist of an integer and a string, or some such.

I assume it is a task of FDW extension to translate between the pseudo
ctid and the primary key in remote side.

For example, if primary key of the remote table is Text data type, an idea
is to use a hash table to track the text-formed primary being associated
with a particular 48-bits integer. The pseudo ctid shall be utilized to track
the tuple to be modified on the scan-stage, then FDW can reference the
hash table to pull-out the primary key to be provided on the prepared
statement.

Do we have some other reasonable ideas?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#3)
Re: [v9.3] writable foreign tables

Kohei KaiGai wrote:

2012/8/25 Robert Haas <robertmhaas@gmail.com>:

On Thu, Aug 23, 2012 at 1:10 AM, Kohei KaiGai <kaigai@kaigai.gr.jp>

wrote:

It is a responsibility of FDW extension (and DBA) to ensure each
foreign-row has a unique identifier that has 48-bits width integer
data type in maximum.

It strikes me as incredibly short-sighted to decide that the row
identifier has to have the same format as what our existing heap AM
happens to have. I think we need to allow the row identifier to be

of

any data type, and even compound. For example, the foreign side

might

have no equivalent of CTID, and thus use primary key. And the

primary

key might consist of an integer and a string, or some such.

I assume it is a task of FDW extension to translate between the pseudo
ctid and the primary key in remote side.

For example, if primary key of the remote table is Text data type, an

idea

is to use a hash table to track the text-formed primary being

associated

with a particular 48-bits integer. The pseudo ctid shall be utilized

to track

the tuple to be modified on the scan-stage, then FDW can reference the
hash table to pull-out the primary key to be provided on the prepared
statement.

And what if there is a hash collision? Then you would not be able to
determine which row is meant.

I agree with Robert that this should be flexible enough to cater for
all kinds of row identifiers. Oracle, for example, uses ten byte
identifiers which would give me a headache with your suggested design.

Do we have some other reasonable ideas?

Would it be too invasive to introduce a new pointer in TupleTableSlot
that is NULL for anything but virtual tuples from foreign tables?

Yours,
Laurenz Albe

#5Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#1)
Re: [v9.3] writable foreign tables

Kaigai-san,

On Thu, Aug 23, 2012 at 2:10 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

The patched portion at contrib/file_fdw.c does not make sense
actually. It just prints messages for each invocation.
It is just a proof-of-concept to show possibility of implementation
based on real RDBMS.

Attached is a tar ball of pgsql_fdw. It's WIP and contains no
document, but it would be enough for your PoC purpose. Usage and
features are same as the last version posted for 9.2 cycle.
# I'll post finished patch in the CF-Sep.

Here are random comments for your PoC patch:

+ As Robert says, using CTID as virtual tuple identifier doesn't seem
nice when considering various FDWs for NoSQL or RDBMS. Having abstract
layer between FDWs and tuple sounds better, but implementing it by each
FDW seems useless effort. Do yo have any idea of generic mechanism for
tuple mapping?

+ Do you have any plan about deparsing local qualifiers into remote
query to avoid repeated query submission? This would improve
performance of big UPDATE, but its use case might be limited to
statements which consist of one foreign table. For this case, we can
consider pass-through mode as second way.

+ I have not read your patch closely yet, but I wonder how we can know
which column is actually updated. If we have only updated image of
tuple, we have to update all remote columns by "new" values?

--
Shigeru Hanada

Attachments:

pgsql_fdw_93.tar.gzapplication/gzip; name=pgsql_fdw_93.tar.gzDownload+1-0
#6KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#4)
Re: [v9.3] writable foreign tables

2012/8/27 Albe Laurenz <laurenz.albe@wien.gv.at>:

Kohei KaiGai wrote:

2012/8/25 Robert Haas <robertmhaas@gmail.com>:

On Thu, Aug 23, 2012 at 1:10 AM, Kohei KaiGai <kaigai@kaigai.gr.jp>

wrote:

It is a responsibility of FDW extension (and DBA) to ensure each
foreign-row has a unique identifier that has 48-bits width integer
data type in maximum.

It strikes me as incredibly short-sighted to decide that the row
identifier has to have the same format as what our existing heap AM
happens to have. I think we need to allow the row identifier to be

of

any data type, and even compound. For example, the foreign side

might

have no equivalent of CTID, and thus use primary key. And the

primary

key might consist of an integer and a string, or some such.

I assume it is a task of FDW extension to translate between the pseudo
ctid and the primary key in remote side.

For example, if primary key of the remote table is Text data type, an

idea

is to use a hash table to track the text-formed primary being

associated

with a particular 48-bits integer. The pseudo ctid shall be utilized

to track

the tuple to be modified on the scan-stage, then FDW can reference the
hash table to pull-out the primary key to be provided on the prepared
statement.

And what if there is a hash collision? Then you would not be able to
determine which row is meant.

Even if we had a hash collision, each hash entry can have the original
key itself to be compared. But anyway, I love the idea to support
an opaque pointer to track particular remote-row rather.

I agree with Robert that this should be flexible enough to cater for
all kinds of row identifiers. Oracle, for example, uses ten byte
identifiers which would give me a headache with your suggested design.

Do we have some other reasonable ideas?

Would it be too invasive to introduce a new pointer in TupleTableSlot
that is NULL for anything but virtual tuples from foreign tables?

I'm not certain whether the duration of TupleTableSlot is enough to
carry a private datum between scan and modify stage.
For example, the TupleTableSlot shall be cleared at ExecNestLoop
prior to the slot being delivered to ExecModifyTuple.

postgres=# EXPLAIN UPDATE t1 SET b = 'abcd' WHERE a IN (SELECT x FROM
t2 WHERE x % 2 = 0);
QUERY PLAN
-------------------------------------------------------------------------------
Update on t1 (cost=0.00..54.13 rows=6 width=16)
-> Nested Loop (cost=0.00..54.13 rows=6 width=16)
-> Seq Scan on t2 (cost=0.00..28.45 rows=6 width=10)
Filter: ((x % 2) = 0)
-> Index Scan using t1_pkey on t1 (cost=0.00..4.27 rows=1 width=10)
Index Cond: (a = t2.x)
(6 rows)

Is it possible to utilize ctid field to move a private pointer?
TID data type is internally represented as a pointer to ItemPointerData,
so it has enough width to track an opaque formed remote-row identifier;
including string, int64 or others.

One disadvantage is "ctid" system column shows a nonsense value
when user explicitly references this system column. But it does not
seems to me a fundamental problem, because we didn't give any
special meaning on the "ctid" field of foreign table.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#7KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#5)
Re: [v9.3] writable foreign tables

2012/8/27 Shigeru HANADA <shigeru.hanada@gmail.com>:

Kaigai-san,

On Thu, Aug 23, 2012 at 2:10 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

The patched portion at contrib/file_fdw.c does not make sense
actually. It just prints messages for each invocation.
It is just a proof-of-concept to show possibility of implementation
based on real RDBMS.

Attached is a tar ball of pgsql_fdw. It's WIP and contains no
document, but it would be enough for your PoC purpose. Usage and
features are same as the last version posted for 9.2 cycle.
# I'll post finished patch in the CF-Sep.

Thanks, it is helpful to work on.

Here are random comments for your PoC patch:

+ As Robert says, using CTID as virtual tuple identifier doesn't seem
nice when considering various FDWs for NoSQL or RDBMS. Having abstract
layer between FDWs and tuple sounds better, but implementing it by each
FDW seems useless effort. Do yo have any idea of generic mechanism for
tuple mapping?

As I wrote in the previous message, isn't it a reasonable idea to move
a private datum (instead of alternate key) on the "ctid" field which has
been internally represented as a pointer to indicate ItemPointerData?

+ Do you have any plan about deparsing local qualifiers into remote
query to avoid repeated query submission? This would improve
performance of big UPDATE, but its use case might be limited to
statements which consist of one foreign table. For this case, we can
consider pass-through mode as second way.

I think, FDW should run UPDATE or DELETE statement at the scan
stage on remote-side, then return a pseudo result to scanner, in case
of the statement is "enough simple", like no qualifier, no returning, etc...
The callback on ExecUpdate/ExecDelete will perform just a stub; that
does not actually work except for increment of affected rows.

+ I have not read your patch closely yet, but I wonder how we can know
which column is actually updated. If we have only updated image of
tuple, we have to update all remote columns by "new" values?

It seems to me TargetEntry of the parse tree can inform us which column
should be modified on UPDATE or INSERT. If it has just a Var element
that reference original table as-is, it means here is no change.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#6)
Re: [v9.3] writable foreign tables

Kohei KaiGai wrote:

It is a responsibility of FDW extension (and DBA) to ensure each
foreign-row has a unique identifier that has 48-bits width integer
data type in maximum.

For example, if primary key of the remote table is Text data type,
an idea is to use a hash table to track the text-formed primary
being associated with a particular 48-bits integer.

Even if we had a hash collision, each hash entry can have the original
key itself to be compared. But anyway, I love the idea to support
an opaque pointer to track particular remote-row rather.

Me too.

Do we have some other reasonable ideas?

I'm not certain whether the duration of TupleTableSlot is enough to
carry a private datum between scan and modify stage.

Is it possible to utilize ctid field to move a private pointer?
TID data type is internally represented as a pointer to

ItemPointerData,

so it has enough width to track an opaque formed remote-row

identifier;

including string, int64 or others.

One disadvantage is "ctid" system column shows a nonsense value
when user explicitly references this system column. But it does not
seems to me a fundamental problem, because we didn't give any
special meaning on the "ctid" field of foreign table.

I can't say if (ab)using the field that way would cause other
problems, but I don't think that "nonsense values" are a problem.
The pointer would stay the same for the duration of the foreign
scan, which I think is as good a ctid for a foreign table as
anybody should reasonably ask.

BTW, I see the following comment in htup.h:

* t_self and t_tableOid should be valid if the HeapTupleData points to
* a disk buffer, or if it represents a copy of a tuple on disk. They
* should be explicitly set invalid in manufactured tuples.

I don't know if "invalid" means "zero" in that case.

Yours,
Laurenz Albe

#9KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#8)
Re: [v9.3] writable foreign tables

2012/8/28 Albe Laurenz <laurenz.albe@wien.gv.at>:

Kohei KaiGai wrote:

It is a responsibility of FDW extension (and DBA) to ensure each
foreign-row has a unique identifier that has 48-bits width integer
data type in maximum.

For example, if primary key of the remote table is Text data type,
an idea is to use a hash table to track the text-formed primary
being associated with a particular 48-bits integer.

Even if we had a hash collision, each hash entry can have the original
key itself to be compared. But anyway, I love the idea to support
an opaque pointer to track particular remote-row rather.

Me too.

Do we have some other reasonable ideas?

I'm not certain whether the duration of TupleTableSlot is enough to
carry a private datum between scan and modify stage.

Is it possible to utilize ctid field to move a private pointer?
TID data type is internally represented as a pointer to

ItemPointerData,

so it has enough width to track an opaque formed remote-row

identifier;

including string, int64 or others.

One disadvantage is "ctid" system column shows a nonsense value
when user explicitly references this system column. But it does not
seems to me a fundamental problem, because we didn't give any
special meaning on the "ctid" field of foreign table.

I can't say if (ab)using the field that way would cause other
problems, but I don't think that "nonsense values" are a problem.
The pointer would stay the same for the duration of the foreign
scan, which I think is as good a ctid for a foreign table as
anybody should reasonably ask.

BTW, I see the following comment in htup.h:

* t_self and t_tableOid should be valid if the HeapTupleData points to
* a disk buffer, or if it represents a copy of a tuple on disk. They
* should be explicitly set invalid in manufactured tuples.

I don't know if "invalid" means "zero" in that case.

ItemPointerSetInvalid is declared as follows:

/*
* ItemPointerSetInvalid
* Sets a disk item pointer to be invalid.
*/
#define ItemPointerSetInvalid(pointer) \
( \
AssertMacro(PointerIsValid(pointer)), \
BlockIdSet(&((pointer)->ip_blkid), InvalidBlockNumber), \
(pointer)->ip_posid = InvalidOffsetNumber \
)

Since ItemPointerGetBlockNumber() and ItemPointerGetOffsetNumber()
checks whether the given ItemPointer is valid, FDWs may have to put
a dummy ItemPointerData on head of their private datum to avoid
the first 6-bytes having zero.

For example, the following data structure is safe to carry an opaque
datum without false-positive of invalid ctid.

typedef struct {
ItemPointerData dumm
char *pk_of_remote_table;
} my_pseudo_rowid;

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: KaiGai Kohei (#6)
Re: [v9.3] writable foreign tables

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Would it be too invasive to introduce a new pointer in TupleTableSlot
that is NULL for anything but virtual tuples from foreign tables?

I'm not certain whether the duration of TupleTableSlot is enough to
carry a private datum between scan and modify stage.

It's not.

Is it possible to utilize ctid field to move a private pointer?

UPDATEs and DELETEs do not rely on the ctid field of tuples to carry the
TID from scan to modify --- in fact, most of the time what the modify
step is going to get is a "virtual" TupleTableSlot that hasn't even
*got* a physical CTID field.

Instead, the planner arranges for the TID to be carried up as an
explicit resjunk column named ctid. (Currently this is done in
rewriteTargetListUD(), but see also preptlist.c which does some related
things for SELECT FOR UPDATE.)

I'm inclined to think that what we need here is for FDWs to be able to
modify the details of that behavior, at least to the extent of being
able to specify a different data type than TID for the row
identification column.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: KaiGai Kohei (#7)
Re: [v9.3] writable foreign tables

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

It seems to me TargetEntry of the parse tree can inform us which column
should be modified on UPDATE or INSERT. If it has just a Var element
that reference original table as-is, it means here is no change.

Only if you're not going to support BEFORE triggers modifying the row...

regards, tom lane

#12David Fetter
david@fetter.org
In reply to: Tom Lane (#11)
Re: [v9.3] writable foreign tables

On Tue, Aug 28, 2012 at 10:58:25AM -0400, Tom Lane wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

It seems to me TargetEntry of the parse tree can inform us which column
should be modified on UPDATE or INSERT. If it has just a Var element
that reference original table as-is, it means here is no change.

Only if you're not going to support BEFORE triggers modifying the row...

+1 for supporting these.

Speaking of triggers on foreign tables, what's needed to support them
independent of support at the FDW level for writing on foreign tables,
or does that even make sense?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: David Fetter (#12)
Re: [v9.3] writable foreign tables

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 10:58:25AM -0400, Tom Lane wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

It seems to me TargetEntry of the parse tree can inform us which column
should be modified on UPDATE or INSERT. If it has just a Var element
that reference original table as-is, it means here is no change.

Only if you're not going to support BEFORE triggers modifying the row...

+1 for supporting these.

Speaking of triggers on foreign tables, what's needed to support them
independent of support at the FDW level for writing on foreign tables,
or does that even make sense?

I agree with trigger support on foreign tables is definitely useful feature,
even though it does not have capability to replace the writable foreign
table functionality.

In case when foreign-table definition does not contain a column mapped
with primary-key column in remote-side, the trigger function cannot
determine which row should be updated / deleted.
It is a situation that FDW driver should track a particular remote-row using
its identifier.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#14David Fetter
david@fetter.org
In reply to: KaiGai Kohei (#13)
Re: [v9.3] writable foreign tables

On Tue, Aug 28, 2012 at 05:18:34PM +0200, Kohei KaiGai wrote:

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 10:58:25AM -0400, Tom Lane wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

It seems to me TargetEntry of the parse tree can inform us
which column should be modified on UPDATE or INSERT. If it has
just a Var element that reference original table as-is, it
means here is no change.

Only if you're not going to support BEFORE triggers modifying the
row...

+1 for supporting these.

Speaking of triggers on foreign tables, what's needed to support
them independent of support at the FDW level for writing on
foreign tables, or does that even make sense?

I agree with trigger support on foreign tables is definitely useful
feature, even though it does not have capability to replace the
writable foreign table functionality.

With utmost respect, trigger support does make it possible to write to
foreign tables using a whole-row comparison with the effect that all
whole-row matches would be affected. This is how DBI-Link does it
currently.

In case when foreign-table definition does not contain a column
mapped with primary-key column in remote-side, the trigger function
cannot determine which row should be updated / deleted. It is a
situation that FDW driver should track a particular remote-row using
its identifier.

Generated identifiers and whole-row matching are two ways to approach
this. There are likely others, especially in cases where people have
special knowledge of the remote source.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#15KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#10)
Re: [v9.3] writable foreign tables

2012/8/28 Tom Lane <tgl@sss.pgh.pa.us>:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Would it be too invasive to introduce a new pointer in TupleTableSlot
that is NULL for anything but virtual tuples from foreign tables?

I'm not certain whether the duration of TupleTableSlot is enough to
carry a private datum between scan and modify stage.

It's not.

Is it possible to utilize ctid field to move a private pointer?

UPDATEs and DELETEs do not rely on the ctid field of tuples to carry the
TID from scan to modify --- in fact, most of the time what the modify
step is going to get is a "virtual" TupleTableSlot that hasn't even
*got* a physical CTID field.

Instead, the planner arranges for the TID to be carried up as an
explicit resjunk column named ctid. (Currently this is done in
rewriteTargetListUD(), but see also preptlist.c which does some related
things for SELECT FOR UPDATE.)

I'm inclined to think that what we need here is for FDWs to be able to
modify the details of that behavior, at least to the extent of being
able to specify a different data type than TID for the row
identification column.

Hmm. It seems to me a straight-forward solution rather than ab-use
of ctid system column. Probably, cstring data type is more suitable
to carry a private datum between scan and modify stage.

One problem I noticed is how FDW driver returns an extra field that
is in neither system nor regular column.
Number of columns and its data type are defined with TupleDesc of
the target foreign-table, so we also need a feature to extend it on
run-time. For example, FDW driver may have to be able to extend
a "virtual" column with cstring data type, even though the target
foreign table does not have such a column.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#16KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: David Fetter (#14)
Re: [v9.3] writable foreign tables

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 05:18:34PM +0200, Kohei KaiGai wrote:

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 10:58:25AM -0400, Tom Lane wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

It seems to me TargetEntry of the parse tree can inform us
which column should be modified on UPDATE or INSERT. If it has
just a Var element that reference original table as-is, it
means here is no change.

Only if you're not going to support BEFORE triggers modifying the
row...

+1 for supporting these.

Speaking of triggers on foreign tables, what's needed to support
them independent of support at the FDW level for writing on
foreign tables, or does that even make sense?

I agree with trigger support on foreign tables is definitely useful
feature, even though it does not have capability to replace the
writable foreign table functionality.

With utmost respect, trigger support does make it possible to write to
foreign tables using a whole-row comparison with the effect that all
whole-row matches would be affected. This is how DBI-Link does it
currently.

In case when foreign-table definition does not contain a column
mapped with primary-key column in remote-side, the trigger function
cannot determine which row should be updated / deleted. It is a
situation that FDW driver should track a particular remote-row using
its identifier.

Generated identifiers and whole-row matching are two ways to approach
this. There are likely others, especially in cases where people have
special knowledge of the remote source.

One major problem is how to carry the generated identifiers on run-time,
even though we have no slot except for system and regular columns
defined in TupleDesc of the target foreign tables.
It may need a feature to expand TupleDesc on demand.

Of course, I don't deny the benefit of trigger support on foreign-tables.
Both writable-feature and trigger-support can be supported simultaneously.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#17David Fetter
david@fetter.org
In reply to: KaiGai Kohei (#16)
Re: [v9.3] writable foreign tables

On Tue, Aug 28, 2012 at 06:08:59PM +0200, Kohei KaiGai wrote:

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 05:18:34PM +0200, Kohei KaiGai wrote:

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 10:58:25AM -0400, Tom Lane wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

It seems to me TargetEntry of the parse tree can inform us
which column should be modified on UPDATE or INSERT. If it has
just a Var element that reference original table as-is, it
means here is no change.

Only if you're not going to support BEFORE triggers modifying the
row...

+1 for supporting these.

Generated identifiers and whole-row matching are two ways to approach
this. There are likely others, especially in cases where people have
special knowledge of the remote source.

One major problem is how to carry the generated identifiers on run-time,
even though we have no slot except for system and regular columns
defined in TupleDesc of the target foreign tables.
It may need a feature to expand TupleDesc on demand.

Could be. You know a lot more about the implementation details than I do.

Of course, I don't deny the benefit of trigger support on foreign-tables.
Both writable-feature and trigger-support can be supported simultaneously.

Do you see these as independent features, or is there some essential
overlap?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#18KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#15)
Re: [v9.3] writable foreign tables

2012/8/28 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2012/8/28 Tom Lane <tgl@sss.pgh.pa.us>:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Would it be too invasive to introduce a new pointer in TupleTableSlot
that is NULL for anything but virtual tuples from foreign tables?

I'm not certain whether the duration of TupleTableSlot is enough to
carry a private datum between scan and modify stage.

It's not.

Is it possible to utilize ctid field to move a private pointer?

UPDATEs and DELETEs do not rely on the ctid field of tuples to carry the
TID from scan to modify --- in fact, most of the time what the modify
step is going to get is a "virtual" TupleTableSlot that hasn't even
*got* a physical CTID field.

Instead, the planner arranges for the TID to be carried up as an
explicit resjunk column named ctid. (Currently this is done in
rewriteTargetListUD(), but see also preptlist.c which does some related
things for SELECT FOR UPDATE.)

I'm inclined to think that what we need here is for FDWs to be able to
modify the details of that behavior, at least to the extent of being
able to specify a different data type than TID for the row
identification column.

Hmm. It seems to me a straight-forward solution rather than ab-use
of ctid system column. Probably, cstring data type is more suitable
to carry a private datum between scan and modify stage.

One problem I noticed is how FDW driver returns an extra field that
is in neither system nor regular column.
Number of columns and its data type are defined with TupleDesc of
the target foreign-table, so we also need a feature to extend it on
run-time. For example, FDW driver may have to be able to extend
a "virtual" column with cstring data type, even though the target
foreign table does not have such a column.

I tried to investigate the related routines.

TupleDesc of TupleTableSlot associated with ForeignScanState
is initialized at ExecInitForeignScan as literal.
ExecAssignScanType assigns TupleDesc of the target foreign-
table on tts_tupleDescriptor, "as-is".
It is the reason why IterateForeignScan cannot return a private
datum except for the columns being declared as regular ones.

Confrontation between ForeignScan and SubqueryScan tell us
the point to be extended. It assigns TupleDesc of the subplan
generated at run-time.
It seems to me ForeignScan will be able to adopt similar idea;
that allows to append "pseudo-column" onto TupleDesc to
carry identifier of remote-rows to be updated / deleted, if
FDW driver can control TupleDesc being set, instead of the
one come from relation's definition as-is.

Any comment please. Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#19KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: David Fetter (#17)
Re: [v9.3] writable foreign tables

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 06:08:59PM +0200, Kohei KaiGai wrote:

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 05:18:34PM +0200, Kohei KaiGai wrote:

2012/8/28 David Fetter <david@fetter.org>:

On Tue, Aug 28, 2012 at 10:58:25AM -0400, Tom Lane wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

It seems to me TargetEntry of the parse tree can inform us
which column should be modified on UPDATE or INSERT. If it has
just a Var element that reference original table as-is, it
means here is no change.

Only if you're not going to support BEFORE triggers modifying the
row...

+1 for supporting these.

Generated identifiers and whole-row matching are two ways to approach
this. There are likely others, especially in cases where people have
special knowledge of the remote source.

One major problem is how to carry the generated identifiers on run-time,
even though we have no slot except for system and regular columns
defined in TupleDesc of the target foreign tables.
It may need a feature to expand TupleDesc on demand.

Could be. You know a lot more about the implementation details than I do.

Of course, I don't deny the benefit of trigger support on foreign-tables.
Both writable-feature and trigger-support can be supported simultaneously.

Do you see these as independent features, or is there some essential
overlap?

If we stand on the viewpoint that foreign-tables should perform as if regular
tables, I don't think its writer feature should depend on trigger stuff.
They can work independently.

On the other hand, trigger feature gives users flexibility to control the data
to be written, as if regular tables. We shouldn't miss the point.
At least, I don't think we have some technical differences to support row-level
triggers on foreign tables.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

#20Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#10)
Re: [v9.3] writable foreign tables

Tom Lane wrote:

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:

Laurenz Albe wrote:

Would it be too invasive to introduce a new pointer in

TupleTableSlot

that is NULL for anything but virtual tuples from foreign tables?

I'm not certain whether the duration of TupleTableSlot is enough to
carry a private datum between scan and modify stage.

It's not.

Is it possible to utilize ctid field to move a private pointer?

UPDATEs and DELETEs do not rely on the ctid field of tuples to carry

the

TID from scan to modify --- in fact, most of the time what the modify
step is going to get is a "virtual" TupleTableSlot that hasn't even
*got* a physical CTID field.

Instead, the planner arranges for the TID to be carried up as an
explicit resjunk column named ctid. (Currently this is done in
rewriteTargetListUD(), but see also preptlist.c which does some

related

things for SELECT FOR UPDATE.)

I'm inclined to think that what we need here is for FDWs to be able to
modify the details of that behavior, at least to the extent of being
able to specify a different data type than TID for the row
identification column.

Would that imply inventing a new system attribute for
"foreign tid"?

Yours,
Laurenz Albe

#21KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#20)
#23KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#18)
#24KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#23)
#25Alexander Korotkov
aekorotkov@gmail.com
In reply to: KaiGai Kohei (#24)
#26KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Alexander Korotkov (#25)
#27Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Alexander Korotkov (#25)
#28Atri Sharma
atri.jiit@gmail.com
In reply to: Laurenz Albe (#27)
#29KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Atri Sharma (#28)
#30Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#29)
#31Atri Sharma
atri.jiit@gmail.com
In reply to: Laurenz Albe (#30)
#32KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#30)
#33Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#32)
#34KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#33)
#35Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#34)
#36KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#35)
#37Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#36)
#38KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#37)
#39Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#38)
#40KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#39)
#41Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#40)
#42KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Laurenz Albe (#41)
#43Erik Rijkers
er@xs4all.nl
In reply to: KaiGai Kohei (#42)
#44Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Erik Rijkers (#43)
#45Laurenz Albe
laurenz.albe@cybertec.at
In reply to: KaiGai Kohei (#42)
#46Ronan Dunklau
rdunklau@gmail.com
In reply to: Laurenz Albe (#45)
#47KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Ronan Dunklau (#46)
#48KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#47)
#49KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#48)
#50Daniel Farina
daniel@heroku.com
In reply to: KaiGai Kohei (#49)
#51KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Daniel Farina (#50)
#52Daniel Farina
daniel@heroku.com
In reply to: Daniel Farina (#50)
#53KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Daniel Farina (#52)
#54Craig Ringer
craig@2ndquadrant.com
In reply to: KaiGai Kohei (#53)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#54)
#56Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#55)
#57KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#55)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: KaiGai Kohei (#53)
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#58)
#60Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#58)
#61Thom Brown
thom@linux.com
In reply to: Tom Lane (#58)
#62Thom Brown
thom@linux.com
In reply to: Thom Brown (#61)
#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#60)
#64Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thom Brown (#61)
#65Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#59)
#66Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#65)
#67Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#66)
#68Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#67)
#69Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#68)
#70Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#64)
#71Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#64)
#72Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#71)
#73Thom Brown
thom@linux.com
In reply to: Bruce Momjian (#71)
#74Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#69)
#75Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#74)
#76Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#75)
#77Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#70)
#78Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#76)
#79Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#78)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#77)
#81Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#80)
#82Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#81)
#83Thom Brown
thom@linux.com
In reply to: Tom Lane (#82)
#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thom Brown (#83)
#85Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#79)
#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#85)
#87Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#72)
#88Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#87)
#89Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#86)
#90Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#89)
#91Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#90)
#92Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#91)
#93Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#92)
#94Daniel Farina
daniel@heroku.com
In reply to: Daniel Farina (#93)
#95Daniel Farina
daniel@heroku.com
In reply to: Daniel Farina (#94)
#96Daniel Farina
daniel@heroku.com
In reply to: Daniel Farina (#95)
#97Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#96)
#98Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#97)
#99Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#98)
#100Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#99)
#101Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Farina (#100)
#102Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#101)