Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

Started by KaiGai Koheiover 10 years ago21 messageshackers

kaigai@ak.jp.nec.com

over 10 years ago

Sorry for my late response. I've been unavailable to have enough
time to touch code for the last 1.5 month.

The attached patch is a revised one to handle private data of
foregn/custom scan node more gracefully.

The overall consensus upthread were:
- A new ExtensibleNodeMethods structure defines a unique name
and a set of callbacks to handle node copy, serialization,
deserialization and equality checks.
- (Foreign|Custom)(Path|Scan|ScanState) are first host of the
ExtensibleNodeMethods, to allow extension to define larger
structure to store its private fields.
- ExtensibleNodeMethods does not support variable length
structure (like a structure with an array on the tail, use
separately allocated array).
- ExtensibleNodeMethods shall be registered on _PG_init() of
extensions.

The 'pgsql-v9.6-custom-private.v3.patch' is the main part of
this feature. As I pointed out before, it uses dynhash instead
of the self invented hash table.

Interfaces are defined as follows (not changed from v2):

typedef struct ExtensibleNodeMethods
{
const char *extnodename;
Size node_size;
void (*nodeCopy)(Node *newnode, const Node *oldnode);
bool (*nodeEqual)(const Node *a, const Node *b);
void (*nodeOut)(struct StringInfoData *str, const Node *node);
void (*nodeRead)(Node *node);
} ExtensibleNodeMethods;

extern void
RegisterExtensibleNodeMethods(const ExtensibleNodeMethods *methods);

extern const ExtensibleNodeMethods *
GetExtensibleNodeMethods(const char *extnodename, bool missing_ok);

Also, 'extensible-node-example-on-pgstrom.patch' is a working
example on its "GpuScan" node.
The code below uses all of copy, serialization and deserialization.

gscan = (GpuScan *)stringToNode(nodeToString(copyObject(cscan)));
elog(INFO, "GpuScan: %s", nodeToString(gscan));

Then, I could confirm private fields are reproduced correctly.

In addition to this, I'd like to suggest two small improvement.

On nodeOut callback, extension will need _outToken() and _outBitmap(),
however, these two functions are static. Entrypoint for extensions
are needed. (Of course, extension can copy & paste these small functions...)

ExtensibleNodeMethods may be registered with a unique pair of its
name and node-tag which is associated with. The current code requires
the name is unique to others, however, it may make a bit inconvenience.
In case of CustomScan, extension need to define three nodes: CustomPath,
CustomScan and CustomScanState, thus, ExtensibleNodeMethods which is
associated with these node must have individually unique name, like
"GpuScanPath", "GpuScan" and "GpuScanState".
If extnodename would be unique within a particular node type, we can
apply same name for all of the three above.

How about your thought?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: Kaigai Kouhei(海外浩平)
Sent: Wednesday, December 02, 2015 5:52 PM
To: 'Robert Haas'
Cc: Andres Freund; Amit Kapila; pgsql-hackers
Subject: Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support
on readfuncs.c)

On Thu, Nov 26, 2015 at 5:27 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

I'm now implementing. The above design perfectly works on ForeignScan.
On the other hands, I'd like to have deeper consideration for CustomScan.

My recent patch adds LibraryName and SymbolName on CustomScanMethods
to lookup the method table even if library is not loaded yet.
However, this ExtensibleNodeMethods relies custom scan provider shall
be loaded, by parallel infrastructure, prior to the deserialization.
It means extension has a chance to register itself as well.

My idea is, redefine CustomScanMethod as follows:

typedef struct ExtensibleNodeMethods
{
const char *extnodename;
Size node_size;
Node *(*nodeCopy)(const Node *from);
bool (*nodeEqual)(const Node *a, const Node *b);
void (*nodeOut)(struct StringInfoData *str, const Node *node);
void (*nodeRead)(Node *node);
} ExtensibleNodeMethods;

typedef struct CustomScanMethods
{
union {
const char *CustomName;
ExtensibleNodeMethods xnode;
};
/* Create execution state (CustomScanState) from a CustomScan plan node

*/

Node *(*CreateCustomScanState) (struct CustomScan *cscan);
} CustomScanMethods;

It allows to pull CustomScanMethods using GetExtensibleNodeMethods
by CustomName; that is alias of extnodename in ExtensibleNodeMethods.
Thus, we don't need to care about LibraryName and SymbolName.

This kind of trick is not needed for ForeignScan because all the pointer
stuff are reproducible by catalog, however, method table of CustomScan
is not.

How about your opinion?

Anonymous unions are not C89, so this is a no-go.

I think we should just drop CustomName and replace it with
ExtensibleNodeMethods. That will break things for third-party code
that is using the existing CustomScan stuff, but there's a good chance
that nobody other than you has written any such code. And even if
someone has, updating it for this change will not be very difficult.

Thanks, I have same opinion.
At this moment, CustomName is not utilized so much except for EXPLAIN
and debug output, so it is not a hard stuff to replace this field by
extnodename, even if custom scan provider does not have own structure
thus no callbacks of node operations are defined.

The attached patch adds 'extnodename' fields on ForeignPath and
ForeignScan, also embeds ExtensibleNodeMethods in CustomPathMethods,
CustomScanMethods and CustomExecMethods.

Its handlers in copyfuncs.c were enhanced to utilize the callback
to allocate a larger structure and copy private fields.
Handlers in outfuncs.c and readfuncs.c have to be symmetric. The
core backend writes out 'extnodename' prior to any private fields,
then we can identify the callback to process rest of tokens for
private fields.

RegisterExtensibleNodeMethods() tracks a pair of 'extnodename'
and ExtensibleNodeMethods itself. It saves the pointer of the
method table, but not duplicate, because _readCustomScan()
cast the method pulled by 'extnodename' thus registered table
has to be a static variable on extension; that means extension
never update ExtensibleNodeMethods once registered.

The one other patch is my test using PG-Strom, however, I didn't
have a test on FDW extensions yet.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#1)

On Mon, Jan 25, 2016 at 8:06 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Sorry for my late response. I've been unavailable to have enough
time to touch code for the last 1.5 month.

The attached patch is a revised one to handle private data of
foregn/custom scan node more gracefully.

The overall consensus upthread were:
- A new ExtensibleNodeMethods structure defines a unique name
and a set of callbacks to handle node copy, serialization,
deserialization and equality checks.
- (Foreign|Custom)(Path|Scan|ScanState) are first host of the
ExtensibleNodeMethods, to allow extension to define larger
structure to store its private fields.
- ExtensibleNodeMethods does not support variable length
structure (like a structure with an array on the tail, use
separately allocated array).
- ExtensibleNodeMethods shall be registered on _PG_init() of
extensions.

The 'pgsql-v9.6-custom-private.v3.patch' is the main part of
this feature. As I pointed out before, it uses dynhash instead
of the self invented hash table.

On a first read-through, I see nothing in this patch to which I would
want to object except for the fact that the comments and documentation
need some work from a native speaker of English. It looks like what
we discussed, and I think it's an improvement over what we have now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Robert Haas (#2)

On Mon, Jan 25, 2016 at 8:06 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Sorry for my late response. I've been unavailable to have enough
time to touch code for the last 1.5 month.

The attached patch is a revised one to handle private data of
foregn/custom scan node more gracefully.

The overall consensus upthread were:
- A new ExtensibleNodeMethods structure defines a unique name
and a set of callbacks to handle node copy, serialization,
deserialization and equality checks.
- (Foreign|Custom)(Path|Scan|ScanState) are first host of the
ExtensibleNodeMethods, to allow extension to define larger
structure to store its private fields.
- ExtensibleNodeMethods does not support variable length
structure (like a structure with an array on the tail, use
separately allocated array).
- ExtensibleNodeMethods shall be registered on _PG_init() of
extensions.

The 'pgsql-v9.6-custom-private.v3.patch' is the main part of
this feature. As I pointed out before, it uses dynhash instead
of the self invented hash table.

On a first read-through, I see nothing in this patch to which I would
want to object except for the fact that the comments and documentation
need some work from a native speaker of English. It looks like what
we discussed, and I think it's an improvement over what we have now.

Thanks,

Do you think we shall allow to register same extensible node name for
different node types? Like, "GpuJoin" for any of CustomPath, CustomScan
and CustomScanState. Or, do we avoid this using different name for each?

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#3)

On Thu, Jan 28, 2016 at 10:18 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Do you think we shall allow to register same extensible node name for
different node types? Like, "GpuJoin" for any of CustomPath, CustomScan
and CustomScanState. Or, do we avoid this using different name for each?

I'd say a different name for each. That's our current convention, and
I don't see much reason to change it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Robert Haas (#4)

On Thu, Jan 28, 2016 at 10:18 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Do you think we shall allow to register same extensible node name for
different node types? Like, "GpuJoin" for any of CustomPath, CustomScan
and CustomScanState. Or, do we avoid this using different name for each?

I'd say a different name for each. That's our current convention, and
I don't see much reason to change it.

OK, it is not a serious problem, at least, for my use cases.
A convention like "GpuJoinPath", "GpuJoin" and "GpuJoinState" are sufficient.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Robert Haas (#2)

On Wed, Jan 27, 2016 at 9:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 25, 2016 at 8:06 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Sorry for my late response. I've been unavailable to have enough
time to touch code for the last 1.5 month.

The attached patch is a revised one to handle private data of
foregn/custom scan node more gracefully.

The overall consensus upthread were:
- A new ExtensibleNodeMethods structure defines a unique name
and a set of callbacks to handle node copy, serialization,
deserialization and equality checks.
- (Foreign|Custom)(Path|Scan|ScanState) are first host of the
ExtensibleNodeMethods, to allow extension to define larger
structure to store its private fields.
- ExtensibleNodeMethods does not support variable length
structure (like a structure with an array on the tail, use
separately allocated array).
- ExtensibleNodeMethods shall be registered on _PG_init() of
extensions.

The 'pgsql-v9.6-custom-private.v3.patch' is the main part of
this feature. As I pointed out before, it uses dynhash instead
of the self invented hash table.

On a first read-through, I see nothing in this patch to which I would
want to object except for the fact that the comments and documentation
need some work from a native speaker of English. It looks like what
we discussed, and I think it's an improvement over what we have now.

Well, looking at this a bit more, it seems like the documentation
you've written here is really misplaced. The patch is introducing a
new facility that applies to both CustomScan and ForeignScan, but the
documentation is only to do with CustomScan. I think we need a whole
new chapter on extensible nodes, or something. I'm actually not
really keen on the fact that we keep adding SGML documentation for
this stuff; it seems like it belongs in a README in the source tree.
We don't explain nodes in general, but now we're going to have to try
to explain extensible nodes. How's that going to work?

I think you should avoid the call to GetExtensibleNodeMethods() in the
case where extnodename is NULL. On the other hand, I think that if
extnodename is non-NULL, all four methods should be required, so that
you don't have to check if (methods && methods->nodeRead) but just if
(extnodename) { methods = GetExtensibleNodeMethods(extnodename);
methods->nodeRead( ... ); }. That seems like it would be a bit
tidier.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Robert Haas (#6)

On Wed, Jan 27, 2016 at 9:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 25, 2016 at 8:06 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Sorry for my late response. I've been unavailable to have enough
time to touch code for the last 1.5 month.

The attached patch is a revised one to handle private data of
foregn/custom scan node more gracefully.

The overall consensus upthread were:
- A new ExtensibleNodeMethods structure defines a unique name
and a set of callbacks to handle node copy, serialization,
deserialization and equality checks.
- (Foreign|Custom)(Path|Scan|ScanState) are first host of the
ExtensibleNodeMethods, to allow extension to define larger
structure to store its private fields.
- ExtensibleNodeMethods does not support variable length
structure (like a structure with an array on the tail, use
separately allocated array).
- ExtensibleNodeMethods shall be registered on _PG_init() of
extensions.

The 'pgsql-v9.6-custom-private.v3.patch' is the main part of
this feature. As I pointed out before, it uses dynhash instead
of the self invented hash table.

On a first read-through, I see nothing in this patch to which I would
want to object except for the fact that the comments and documentation
need some work from a native speaker of English. It looks like what
we discussed, and I think it's an improvement over what we have now.

Well, looking at this a bit more, it seems like the documentation
you've written here is really misplaced. The patch is introducing a
new facility that applies to both CustomScan and ForeignScan, but the
documentation is only to do with CustomScan. I think we need a whole
new chapter on extensible nodes, or something. I'm actually not
really keen on the fact that we keep adding SGML documentation for
this stuff; it seems like it belongs in a README in the source tree.
We don't explain nodes in general, but now we're going to have to try
to explain extensible nodes. How's that going to work?

The detail of these callbacks are not for end-users, administrators and
so on except for core/extension developers. So, I loves idea not to have
such a detailed description in SGML.
How about an idea to have more detailed source code comments close to
the definition of ExtensibleNodeMethods?
I haven't seen the src/backend/nodes/README yet, and it has only 6 updates
history from Postgres95 era. I guess people may forget to update README
file if description is separately located from the implementation.

I think you should avoid the call to GetExtensibleNodeMethods() in the
case where extnodename is NULL. On the other hand, I think that if
extnodename is non-NULL, all four methods should be required, so that
you don't have to check if (methods && methods->nodeRead) but just if
(extnodename) { methods = GetExtensibleNodeMethods(extnodename);
methods->nodeRead( ... ); }. That seems like it would be a bit
tidier.

OK, I'll fix up. No need to have 'missing_ok' argument here.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#7)

On Wed, Feb 3, 2016 at 8:00 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Well, looking at this a bit more, it seems like the documentation
you've written here is really misplaced. The patch is introducing a
new facility that applies to both CustomScan and ForeignScan, but the
documentation is only to do with CustomScan. I think we need a whole
new chapter on extensible nodes, or something. I'm actually not
really keen on the fact that we keep adding SGML documentation for
this stuff; it seems like it belongs in a README in the source tree.
We don't explain nodes in general, but now we're going to have to try
to explain extensible nodes. How's that going to work?

The detail of these callbacks are not for end-users, administrators and
so on except for core/extension developers. So, I loves idea not to have
such a detailed description in SGML.
How about an idea to have more detailed source code comments close to
the definition of ExtensibleNodeMethods?
I haven't seen the src/backend/nodes/README yet, and it has only 6 updates
history from Postgres95 era. I guess people may forget to update README
file if description is separately located from the implementation.

Hmm, that might work, although that file is so old that it may be
difficult to add to. Another idea is: maybe we could have a header
file for the extensible node stuff and just give it a really long
header comment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Robert Haas (#8)

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
Sent: Thursday, February 04, 2016 11:39 AM
To: Kaigai Kouhei(海外浩平)
Cc: Andres Freund; Amit Kapila; pgsql-hackers
Subject: Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support
on readfuncs.c)

On Wed, Feb 3, 2016 at 8:00 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Well, looking at this a bit more, it seems like the documentation
you've written here is really misplaced. The patch is introducing a
new facility that applies to both CustomScan and ForeignScan, but the
documentation is only to do with CustomScan. I think we need a whole
new chapter on extensible nodes, or something. I'm actually not
really keen on the fact that we keep adding SGML documentation for
this stuff; it seems like it belongs in a README in the source tree.
We don't explain nodes in general, but now we're going to have to try
to explain extensible nodes. How's that going to work?

The detail of these callbacks are not for end-users, administrators and
so on except for core/extension developers. So, I loves idea not to have
such a detailed description in SGML.
How about an idea to have more detailed source code comments close to
the definition of ExtensibleNodeMethods?
I haven't seen the src/backend/nodes/README yet, and it has only 6 updates
history from Postgres95 era. I guess people may forget to update README
file if description is separately located from the implementation.

Hmm, that might work, although that file is so old that it may be
difficult to add to. Another idea is: maybe we could have a header
file for the extensible node stuff and just give it a really long
header comment.

At this moment, I tried to write up description at nodes/nodes.h.
The amount of description is about 100lines. It is on a borderline
whether we split off this chunk into another header file, in my sense.

On the other hands, I noticed that we cannot omit checks for individual
callbacks on CustomXXXX node type, ExtensibleNodeMethods is embedded in
the CustomXXXXMethods structure, thus we may have CustomXXXX node with
no extensible feature.
This manner is beneficial because extension does not need to register
the library and symbol name for serialization. So, CustomScan related
code still checks existence of individual callbacks.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#9)

On Wed, Feb 3, 2016 at 11:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

At this moment, I tried to write up description at nodes/nodes.h.
The amount of description is about 100lines. It is on a borderline
whether we split off this chunk into another header file, in my sense.

On the other hands, I noticed that we cannot omit checks for individual
callbacks on CustomXXXX node type, ExtensibleNodeMethods is embedded in
the CustomXXXXMethods structure, thus we may have CustomXXXX node with
no extensible feature.
This manner is beneficial because extension does not need to register
the library and symbol name for serialization. So, CustomScan related
code still checks existence of individual callbacks.

I was looking over this patch yesterday, and something was bugging me
about it, but I couldn't quite put my finger on what it was. But now
I think I know.

I think of an extensible node type facility as something that ought to
be defined to allow a user to create new types of nodes. But this is
not that. What this does is allows you to have a CustomScan or
ForeignScan node - that is, the existing node type - which is actually
larger than a normal node of that type and has some extra data that is
part of it. I'm having a really hard time being comfortable with that
concept. Somehow, it seems like the node type should determine the
size of the node. I can stretch my brain to the point of being able
to say, well, maybe if the node tag is T_ExtensibleNode, then you can
look at char *extnodename to figure out what type of node it is
really, and then from there get the size. But what this does is:
every time you see a T_CustomScan or T_ForeignScan node, it might not
really be that kind of node but something else else, and tomorrow
there might be another half-dozen node types with a similar property.
And every one of those node types will have char *extnodename in a
different place in the structure, so a hypothetical piece of code that
wanted to find the extension methods for a node, or the size of a
node, would need a switch that knows about all of those node types.
It feels very awkward.

So I have a slightly different proposal. Let's forget about letting
T_CustomScan or T_ForeignScan or any other built-in node type vary in
size. Instead, let's add T_ExtensibleNode which acts as a completely
user-defined node. It has read/out/copy/equalfuncs support along the
lines of what this patch implements, and that's it. It's not a scan
or a path or anything like that: it's just an opaque data container
that the system can input, output, copy, and test for equality, and
nothing else. Isn't that useless? Not at all. If you're creating an
FDW or custom scan provider and want to store some extra data, you can
create a new type of extensible node and stick it in the fdw_private
or custom_private field! The data won't be part of the CustomScan or
ForeignScan structure itself, as in this patch, but who cares? The
only objection I can see is that you still need several pointer
deferences to get to the data since fdw_private is a List, but if
that's actually a real performance problem we could probably fix it by
just changing fdw_private to a Node instead. You'd still need one
pointer dereference, but that doesn't seem too bad.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Robert Haas (#10)

On Wed, Feb 3, 2016 at 11:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

At this moment, I tried to write up description at nodes/nodes.h.
The amount of description is about 100lines. It is on a borderline
whether we split off this chunk into another header file, in my sense.

On the other hands, I noticed that we cannot omit checks for individual
callbacks on CustomXXXX node type, ExtensibleNodeMethods is embedded in
the CustomXXXXMethods structure, thus we may have CustomXXXX node with
no extensible feature.
This manner is beneficial because extension does not need to register
the library and symbol name for serialization. So, CustomScan related
code still checks existence of individual callbacks.

I was looking over this patch yesterday, and something was bugging me
about it, but I couldn't quite put my finger on what it was. But now
I think I know.

I think of an extensible node type facility as something that ought to
be defined to allow a user to create new types of nodes. But this is
not that. What this does is allows you to have a CustomScan or
ForeignScan node - that is, the existing node type - which is actually
larger than a normal node of that type and has some extra data that is
part of it. I'm having a really hard time being comfortable with that
concept. Somehow, it seems like the node type should determine the
size of the node. I can stretch my brain to the point of being able
to say, well, maybe if the node tag is T_ExtensibleNode, then you can
look at char *extnodename to figure out what type of node it is
really, and then from there get the size. But what this does is:
every time you see a T_CustomScan or T_ForeignScan node, it might not
really be that kind of node but something else else, and tomorrow
there might be another half-dozen node types with a similar property.
And every one of those node types will have char *extnodename in a
different place in the structure, so a hypothetical piece of code that
wanted to find the extension methods for a node, or the size of a
node, would need a switch that knows about all of those node types.
It feels very awkward.

So I have a slightly different proposal. Let's forget about letting
T_CustomScan or T_ForeignScan or any other built-in node type vary in
size. Instead, let's add T_ExtensibleNode which acts as a completely
user-defined node. It has read/out/copy/equalfuncs support along the
lines of what this patch implements, and that's it. It's not a scan
or a path or anything like that: it's just an opaque data container
that the system can input, output, copy, and test for equality, and
nothing else. Isn't that useless? Not at all. If you're creating an
FDW or custom scan provider and want to store some extra data, you can
create a new type of extensible node and stick it in the fdw_private
or custom_private field! The data won't be part of the CustomScan or
ForeignScan structure itself, as in this patch, but who cares? The
only objection I can see is that you still need several pointer
deferences to get to the data since fdw_private is a List, but if
that's actually a real performance problem we could probably fix it by
just changing fdw_private to a Node instead. You'd still need one
pointer dereference, but that doesn't seem too bad.

Thoughts?

The new callbacks of T_ExtensibleNode will replace the necessity to
form and deform process of private values, like as:
https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L114

It transforms a bunch of internal data of CustomScan (similar to the
extended fields in T_ExtensibleNode) to/from the node functions
understandable forms for copy, input and output support.
I think it implies you proposition is workable.

I'd like to follow this proposition basically.
On the other hands, I have two points I want to pay attention.

1. At this moment, it is allowed to define a larger structure that
embeds CustomPath and CustomScanState by extension. How do we treat
this coding manner in this case? Especially, CustomScanState has no
private pointer dereference because it assumes an extension define
a larger structure. Of course, we don't need to care about node
operations on Path and PlanState nodes, but right now.

2. I intended to replace LibraryName and SymbolName fields from the
CustomScanMethods structure by integration of extensible node type.
We had to give a pair of these identifiers because custom scan provider
has no registration points at this moment. A little concern is extension
has to assume a particular filename of itself.
But, probably, it shall be a separated discussion. My preference is
preliminary registration of custom scan provider by its name, as well
as extensible node.

Towards the last question; whether *_private shall be void * or List *,
I want to keep fdw_private and custom_private as List * pointer, because
a new node type definition is a bit overdone job if this FDW or CSP will
take only a few private fields with primitive data types.
It is a preferable features when extension defines ten or more private
fields.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

#12

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#11)

On Sun, Feb 7, 2016 at 7:28 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The new callbacks of T_ExtensibleNode will replace the necessity to
form and deform process of private values, like as:
https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L114

Yeah.

It transforms a bunch of internal data of CustomScan (similar to the
extended fields in T_ExtensibleNode) to/from the node functions
understandable forms for copy, input and output support.
I think it implies you proposition is workable.

I'd like to follow this proposition basically.
On the other hands, I have two points I want to pay attention.

1. At this moment, it is allowed to define a larger structure that
embeds CustomPath and CustomScanState by extension. How do we treat
this coding manner in this case? Especially, CustomScanState has no
private pointer dereference because it assumes an extension define
a larger structure. Of course, we don't need to care about node
operations on Path and PlanState nodes, but right now.

I see no real advantage in letting a CustomPath be larger. If
custom_private can include extension-defined node types, that seems
good enough. On the other hand, if CustomScanState can be larger,
that seems fine. We don't really need any special support for that,
do we?

2. I intended to replace LibraryName and SymbolName fields from the
CustomScanMethods structure by integration of extensible node type.
We had to give a pair of these identifiers because custom scan provider
has no registration points at this moment. A little concern is extension
has to assume a particular filename of itself.
But, probably, it shall be a separated discussion. My preference is
preliminary registration of custom scan provider by its name, as well
as extensible node.

Seems like we could just leave the CustomScan stuff alone and worry
about this as a separate facility.

Towards the last question; whether *_private shall be void * or List *,
I want to keep fdw_private and custom_private as List * pointer, because
a new node type definition is a bit overdone job if this FDW or CSP will
take only a few private fields with primitive data types.
It is a preferable features when extension defines ten or more private
fields.

Well, I suggested Node *, not void *. A Node can be a List, but not
every Node is a List.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Robert Haas (#12)

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Monday, February 08, 2016 11:59 PM
To: Kaigai Kouhei(海外浩平)
Cc: Andres Freund; Amit Kapila; pgsql-hackers
Subject: ##freemail## Re: CustomScan in a larger structure (RE: [HACKERS]
CustomScan support on readfuncs.c)

On Sun, Feb 7, 2016 at 7:28 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The new callbacks of T_ExtensibleNode will replace the necessity to
form and deform process of private values, like as:
https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L114

Yeah.

It transforms a bunch of internal data of CustomScan (similar to the
extended fields in T_ExtensibleNode) to/from the node functions
understandable forms for copy, input and output support.
I think it implies you proposition is workable.

I'd like to follow this proposition basically.
On the other hands, I have two points I want to pay attention.

1. At this moment, it is allowed to define a larger structure that
embeds CustomPath and CustomScanState by extension. How do we treat
this coding manner in this case? Especially, CustomScanState has no
private pointer dereference because it assumes an extension define
a larger structure. Of course, we don't need to care about node
operations on Path and PlanState nodes, but right now.

I see no real advantage in letting a CustomPath be larger. If
custom_private can include extension-defined node types, that seems
good enough. On the other hand, if CustomScanState can be larger,
that seems fine. We don't really need any special support for that,
do we?

Yes. Right now, we have no code path that handles PlanState or its
inheritance using node operations. So, it is not a real problem.

2. I intended to replace LibraryName and SymbolName fields from the
CustomScanMethods structure by integration of extensible node type.
We had to give a pair of these identifiers because custom scan provider
has no registration points at this moment. A little concern is extension
has to assume a particular filename of itself.
But, probably, it shall be a separated discussion. My preference is
preliminary registration of custom scan provider by its name, as well
as extensible node.

Seems like we could just leave the CustomScan stuff alone and worry
about this as a separate facility.

Towards the last question; whether *_private shall be void * or List *,
I want to keep fdw_private and custom_private as List * pointer, because
a new node type definition is a bit overdone job if this FDW or CSP will
take only a few private fields with primitive data types.
It is a preferable features when extension defines ten or more private
fields.

Well, I suggested Node *, not void *. A Node can be a List, but not
every Node is a List.

It is pretty good!

The attached patch (primary one) implements the above idea.

Now ExtensibleNode works as a basis structure of data container,
regardless of CustomScan and ForeignScan.
Also, fdw_private and custom_private are de-defined to Node * type
from List * type. It affected to a few FDW APIs.

The secondary patch is a demonstration of new ExtensibleNode using
postgres_fdw extension. Its private data are expected to be packed
in a list with a particular order. Self defined structure allows to
keep these variables without ugly pack/unpacking.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Import Notes

Resolved by subject fallback

#14

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#13)

On Wed, Feb 10, 2016 at 1:25 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It is pretty good!

The attached patch (primary one) implements the above idea.

Now ExtensibleNode works as a basis structure of data container,
regardless of CustomScan and ForeignScan.
Also, fdw_private and custom_private are de-defined to Node * type
from List * type. It affected to a few FDW APIs.

I extracted the subset of this that just creates the extensible node
framework and did some cleanup - the result is attached. I will
commit this if nobody objects.

I think the part about whacking around the FDW API is a little more
potentially objectionable to others, so I want to hold off doing that
unless a few more people chime in with +1. Perhaps we could start a
new thread to talk about that specific idea. This is useful even
without that, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Robert Haas (#14)

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
Sent: Thursday, February 11, 2016 1:26 PM
To: Kaigai Kouhei(海外浩平)
Cc: Andres Freund; Amit Kapila; pgsql-hackers
Subject: Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support
on readfuncs.c)

On Wed, Feb 10, 2016 at 1:25 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It is pretty good!

The attached patch (primary one) implements the above idea.

Now ExtensibleNode works as a basis structure of data container,
regardless of CustomScan and ForeignScan.
Also, fdw_private and custom_private are de-defined to Node * type
from List * type. It affected to a few FDW APIs.

I extracted the subset of this that just creates the extensible node
framework and did some cleanup - the result is attached. I will
commit this if nobody objects.

I have no objection of course.

I think the part about whacking around the FDW API is a little more
potentially objectionable to others, so I want to hold off doing that
unless a few more people chime in with +1. Perhaps we could start a
new thread to talk about that specific idea. This is useful even
without that, though.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: KaiGai Kohei (#15)

On Thu, Feb 11, 2016 at 7:06 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
Sent: Thursday, February 11, 2016 1:26 PM
To: Kaigai Kouhei(海外浩平)
Cc: Andres Freund; Amit Kapila; pgsql-hackers
Subject: Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support
on readfuncs.c)

On Wed, Feb 10, 2016 at 1:25 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It is pretty good!

The attached patch (primary one) implements the above idea.

Now ExtensibleNode works as a basis structure of data container,
regardless of CustomScan and ForeignScan.
Also, fdw_private and custom_private are de-defined to Node * type
from List * type. It affected to a few FDW APIs.

I extracted the subset of this that just creates the extensible node
framework and did some cleanup - the result is attached. I will
commit this if nobody objects.

I have no objection of course.

Done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#14)

Hi,

On 2016-02-10 23:26:20 -0500, Robert Haas wrote:

I think the part about whacking around the FDW API is a little more
potentially objectionable to others, so I want to hold off doing that
unless a few more people chime in with +1. Perhaps we could start a
new thread to talk about that specific idea. This is useful even
without that, though.

FWIW, I can delete a couple hundred lines of code from citusdb thanks to
this...

A quick questions about what you committed:

@@ -527,10 +532,17 @@ extern PGDLLIMPORT Node *newNodeMacroHolder;
*/
extern char *nodeToString(const void *obj);

+struct Bitmapset;		/* not to include bitmapset.h here */
+struct StringInfoData;	/* not to include stringinfo.h here */
+extern void outToken(struct StringInfoData *str, const char *s);
+extern void outBitmapset(struct StringInfoData *str,
+						 const struct Bitmapset *bms);
+
/*
* nodes/{readfuncs.c,read.c}
*/
extern void *stringToNode(char *str);
+extern struct Bitmapset *readBitmapset(void);

why exactly did you expose read/writeBitmapset(), and nothing else?
Afaics a lot of the other read routines are also pretty necessary from
the outside?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Andres Freund (#17)

On Fri, Feb 12, 2016 at 9:56 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-02-10 23:26:20 -0500, Robert Haas wrote:

I think the part about whacking around the FDW API is a little more
potentially objectionable to others, so I want to hold off doing that
unless a few more people chime in with +1. Perhaps we could start a
new thread to talk about that specific idea. This is useful even
without that, though.

FWIW, I can delete a couple hundred lines of code from citusdb thanks to
this...

A quick questions about what you committed:
@@ -527,10 +532,17 @@ extern PGDLLIMPORT Node *newNodeMacroHolder;
*/
extern char *nodeToString(const void *obj);
+struct Bitmapset;            /* not to include bitmapset.h here */
+struct StringInfoData;       /* not to include stringinfo.h here */
+extern void outToken(struct StringInfoData *str, const char *s);
+extern void outBitmapset(struct StringInfoData *str,
+                                              const struct Bitmapset *bms);
+
/*
* nodes/{readfuncs.c,read.c}
*/
extern void *stringToNode(char *str);
+extern struct Bitmapset *readBitmapset(void);
why exactly did you expose read/writeBitmapset(), and nothing else?
Afaics a lot of the other read routines are also pretty necessary from
the outside?

Because that's what KaiGai submitted and I had no reason to
second-guess it. I'm happy to expose other stuff along similar lines
if somebody writes a patch for it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

KaiGai Kohei

kaigai@ak.jp.nec.com

over 10 years ago

In reply to: Robert Haas (#18)

2016-02-13 0:11 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:

On Fri, Feb 12, 2016 at 9:56 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-02-10 23:26:20 -0500, Robert Haas wrote:

I think the part about whacking around the FDW API is a little more
potentially objectionable to others, so I want to hold off doing that
unless a few more people chime in with +1. Perhaps we could start a
new thread to talk about that specific idea. This is useful even
without that, though.

FWIW, I can delete a couple hundred lines of code from citusdb thanks to
this...

A quick questions about what you committed:
@@ -527,10 +532,17 @@ extern PGDLLIMPORT Node *newNodeMacroHolder;
*/
extern char *nodeToString(const void *obj);
+struct Bitmapset;            /* not to include bitmapset.h here */
+struct StringInfoData;       /* not to include stringinfo.h here */
+extern void outToken(struct StringInfoData *str, const char *s);
+extern void outBitmapset(struct StringInfoData *str,
+                                              const struct Bitmapset *bms);
+
/*
* nodes/{readfuncs.c,read.c}
*/
extern void *stringToNode(char *str);
+extern struct Bitmapset *readBitmapset(void);
why exactly did you expose read/writeBitmapset(), and nothing else?
Afaics a lot of the other read routines are also pretty necessary from
the outside?
Because that's what KaiGai submitted and I had no reason to
second-guess it. I'm happy to expose other stuff along similar lines
if somebody writes a patch for it.

Although we have various service macro in outfuncs.c and readfuncs.c,
it is not a big burden for extension authors to write equivalent code,
except for string and bitmap. On the other hands, string needs _outToken,
bitmap needs _outBitmap. It is not good idea to suggest extensions to
have copy & paste code with no clear reason.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Andres Freund (#17)

On 2016-02-12 15:56:45 +0100, Andres Freund wrote:

Hi,

On 2016-02-10 23:26:20 -0500, Robert Haas wrote:

I think the part about whacking around the FDW API is a little more
potentially objectionable to others, so I want to hold off doing that
unless a few more people chime in with +1. Perhaps we could start a
new thread to talk about that specific idea. This is useful even
without that, though.

FWIW, I can delete a couple hundred lines of code from citusdb thanks to
this...

And I'm now working on doing that.

why exactly did you expose read/writeBitmapset(), and nothing else?
Afaics a lot of the other read routines are also pretty necessary from
the outside?

I'd like to also expose at least outDatum()/readDatum() - they're not
entirely trivial, so it'd be sad to copy them.

What I'm wondering about right now is how an extensible node should
implement the equivalent of
#define WRITE_NODE_FIELD(fldname) \
(appendStringInfo(str, " :" CppAsString(fldname) " "), \
_outNode(str, node->fldname))

given that _outNode isn't public, that seems to imply having to do
something like

#define WRITE_NODE_FIELD(fldname) \
(appendStringInfo(str, " :" CppAsString(fldname) " "), \
appendStringInfo(str, nodeToString(node->fldname)))

i.e. essentially doubling memory overhead. Istm we should make
outNode() externally visible?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#20)

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

Attachments:

Attachments:

Attachments:

Attachments: