IMPORT FOREIGN SCHEMA statement

Started by Ronan Dunklauabout 12 years ago51 messageshackers

ronan.dunklau@dalibo.com

about 12 years ago

Hello,

Since my last proposal didn't get any strong rebuttal, please find attached a
more complete version of the IMPORT FOREIGN SCHEMA statement.

I tried to follow the SQL-MED specification as closely as possible.

This adds discoverability to foreign servers. The structure of the statement
as I understand it is simple enough:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO |
EXCEPT) table_list ] INTO local_schema.

The import_foreign_schema patch adds the infrastructure, and a new FDW
routine:

typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
ImportForeignSchemaStmt * parsetree);

This routine must return a list of CreateForeignTableStmt mirroring whatever
tables were found on the remote side, which will then be executed.

The import_foreign_schema_postgres_fdw patch proposes an implementation of
this API for postgres_fdw. It will import a foreign schema using the right
types as well as nullable information.

Regarding documentation, I don't really know where it should have been put. If
I missed something, let me know and I'll try to correct it.

--
Ronan Dunklau
http://dalibo.com - http://dalibo.org

David Fetter

david@fetter.org

about 12 years ago

In reply to: Ronan Dunklau (#1)

Re: IMPORT FOREIGN SCHEMA statement

On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote:

Hello,

Since my last proposal didn't get any strong rebuttal, please find attached a
more complete version of the IMPORT FOREIGN SCHEMA statement.

Thanks!

Please to send future patches to this thread so people can track them
in their mail.

I tried to follow the SQL-MED specification as closely as possible.

This adds discoverability to foreign servers. The structure of the statement
as I understand it is simple enough:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO |
EXCEPT) table_list ] INTO local_schema.

The import_foreign_schema patch adds the infrastructure, and a new FDW
routine:

typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
ImportForeignSchemaStmt * parsetree);

This routine must return a list of CreateForeignTableStmt mirroring whatever
tables were found on the remote side, which will then be executed.

The import_foreign_schema_postgres_fdw patch proposes an implementation of
this API for postgres_fdw. It will import a foreign schema using the right
types as well as nullable information.

In the case of PostgreSQL, "the right types" are obvious until there's
a user-defined one. What do you plan to do in that case?

Regarding documentation, I don't really know where it should have been put. If
I missed something, let me know and I'll try to correct it.

It's not exactly something you missed, but I need to bring it up
anyway before we go too far. The standard missed two crucial concepts
when this part of it was written:

1. No single per-database-type universal type mapping can be correct.

People will have differing goals for type mapping, and writing a whole
new FDW for each of those goals is, to put it mildly, wasteful. I
will illustrate with a concrete and common example.

MySQL's datetime type encourages usages which PostgreSQL's
corresponding type, timestamptz, simply disallows, namely '0000-00-00
00:00:00' as its idea of UNKNOWN or NULL.

One way PostgreSQL's mapping could work is to map it to TEXT, which
would preserve the strings exactly and be in some sense an identity
map. It would also make the type somewhat useless in its original
intended form.

Another one would map the type is to a composite data type
mysql_datetime(tstz timestamptz, is_wacky boolean) which would
capture, for example, ('2014-04-01 00:00:00+00', false) for the UTC
start of April Fools' Day this year, and (NULL, true) for '0000-00-00
00:00:00'.

There are doubtless others, and there is no principled way to assign
any one of them as universally correct.

This brings me to the next crucial concept the standard missed:

2. The correct mapping may not be the identity, and furthermore, the
inbound and outbound mappings might in general not be mere inversions
of each other.

MySQL (no aspersions intended) again provides a concrete example with
its unsigned integer types. Yes, it's possible to create a domain
over INT8 which simulates UINT4, a domain over NUMERIC which simulates
UINT8, etc., but again this process's correctness depends on
circumstances.

To address these problems, I propose the following:

- We make type mappings settable at the level of:
- FDW
- Instance (a.k.a. cluster)
- Database
- Schema
- Table
- Column

using the existing ALTER command and some way of spelling out how
a remote type maps to a local type.

This would consist of:
- The remote type
- The local type to which it maps
- The inbound transformation (default identity)
- The outbound transformation (default identity)

At any given level, the remote type would need to be unique. To
communicate this to the system, we either invent new syntax, with
all the hazards attendant thereto, or we could use JSON or similar
serialization.

ALTER FOREIGN TABLE foo
ADD TYPE MAPPING
FROM "datetime"
TO TEXT
WITH (
INBOUND TRANSFORMATION IDENTITY,
OUTBOUND TRANSFORMATION IDENTITY
) /* Ugh!!! */

vs.

ALTER FOREIGN TABLE foo ADD (mapping '{
"datetime": "text",
"inbound": "IDENTITY",
outbound: "IDENTITY"
}')

Each FDW would have some set of default mappings and some way to
override them as above.

What say?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ronan Dunklau

ronan.dunklau@dalibo.com

about 12 years ago

In reply to: David Fetter (#2)

Re: IMPORT FOREIGN SCHEMA statement

Le dimanche 25 mai 2014 12:41:18 David Fetter a écrit :

On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote:

Hello,

Since my last proposal didn't get any strong rebuttal, please find
attached a more complete version of the IMPORT FOREIGN SCHEMA statement.

Thanks!

Please to send future patches to this thread so people can track them
in their mail.

I'll do.

I didn't for the previous one because it was a few months ago, and no patch
had been added to the commit fest.

I tried to follow the SQL-MED specification as closely as possible.

This adds discoverability to foreign servers. The structure of the
statement as I understand it is simple enough:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO |
EXCEPT) table_list ] INTO local_schema.

The import_foreign_schema patch adds the infrastructure, and a new FDW
routine:

typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
ImportForeignSchemaStmt * parsetree);

This routine must return a list of CreateForeignTableStmt mirroring
whatever tables were found on the remote side, which will then be
executed.

The import_foreign_schema_postgres_fdw patch proposes an implementation of
this API for postgres_fdw. It will import a foreign schema using the right
types as well as nullable information.

In the case of PostgreSQL, "the right types" are obvious until there's
a user-defined one. What do you plan to do in that case ?

The current implementation fetches the types as regtype, and when receiving a
custom type, two things can happen:

- the type is defined locally: everything will work as expected
- the type is not defined locally: the conversion function will fail, and
raise an error of the form: ERROR: type "schema.typname" does not exist

Should I add that to the regression test suite ?

Regarding documentation, I don't really know where it should have been
put. If I missed something, let me know and I'll try to correct it.

It's not exactly something you missed, but I need to bring it up
anyway before we go too far. The standard missed two crucial concepts
when this part of it was written:

1. No single per-database-type universal type mapping can be correct.

People will have differing goals for type mapping, and writing a whole
new FDW for each of those goals is, to put it mildly, wasteful. I
will illustrate with a concrete and common example.

MySQL's datetime type encourages usages which PostgreSQL's
corresponding type, timestamptz, simply disallows, namely '0000-00-00
00:00:00' as its idea of UNKNOWN or NULL.

One way PostgreSQL's mapping could work is to map it to TEXT, which
would preserve the strings exactly and be in some sense an identity
map. It would also make the type somewhat useless in its original
intended form.

Another one would map the type is to a composite data type
mysql_datetime(tstz timestamptz, is_wacky boolean) which would
capture, for example, ('2014-04-01 00:00:00+00', false) for the UTC
start of April Fools' Day this year, and (NULL, true) for '0000-00-00
00:00:00'.

There are doubtless others, and there is no principled way to assign
any one of them as universally correct.

This brings me to the next crucial concept the standard missed:

2. The correct mapping may not be the identity, and furthermore, the
inbound and outbound mappings might in general not be mere inversions
of each other.

MySQL (no aspersions intended) again provides a concrete example with
its unsigned integer types. Yes, it's possible to create a domain
over INT8 which simulates UINT4, a domain over NUMERIC which simulates
UINT8, etc., but again this process's correctness depends on
circumstances.

To address these problems, I propose the following:

- We make type mappings settable at the level of:
- FDW
- Instance (a.k.a. cluster)
- Database
- Schema
- Table
- Column

using the existing ALTER command and some way of spelling out how
a remote type maps to a local type.

This would consist of:
- The remote type
- The local type to which it maps
- The inbound transformation (default identity)
- The outbound transformation (default identity)

At any given level, the remote type would need to be unique. To
communicate this to the system, we either invent new syntax, with
all the hazards attendant thereto, or we could use JSON or similar
serialization.

ALTER FOREIGN TABLE foo
ADD TYPE MAPPING
FROM "datetime"
TO TEXT
WITH (
INBOUND TRANSFORMATION IDENTITY,
OUTBOUND TRANSFORMATION IDENTITY
) /* Ugh!!! */

vs.

ALTER FOREIGN TABLE foo ADD (mapping '{
"datetime": "text",
"inbound": "IDENTITY",
outbound: "IDENTITY"
}')

Each FDW would have some set of default mappings and some way to
override them as above.

I understand your points, but I'm not really comfortable with the concept,
unless there is something that I missed.

We can already support this use case through specific-fdw options. Should I
add that to postgres_fdw ?

Additionally, I don't really see how that would be useful in a general case.
With an "in-core" defined meaning of type transformation, any FDW that doesn't
fit exactly into the model would have a hard time. For example, what happens
if an FDW is only ever capable of returning text ? Or if a mapping can only be
set at the server or FDW model because it depends on some connection parameter
? The bulk of the code for managing type mappings would be FDW-specific
anyway. What you're proposing looks like a "universal option", with a specific
syntax, that should therefore be supported by all fdws, with well-defined
semantics.

Moreover, extending the spec seems a bit dangerous to me, since if the spec
decides to address this specific point in the future, there is a risk that our
behavior will not be compatible.

Thank you for your feedback,

--
Ronan Dunklau
http://dalibo.com - http://dalibo.org

David Fetter

david@fetter.org

about 12 years ago

In reply to: Ronan Dunklau (#3)

Re: IMPORT FOREIGN SCHEMA statement

On Sun, May 25, 2014 at 11:23:41PM +0200, Ronan Dunklau wrote:

Le dimanche 25 mai 2014 12:41:18 David Fetter a ï¿½crit :

On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote:

Hello,

Since my last proposal didn't get any strong rebuttal, please find
attached a more complete version of the IMPORT FOREIGN SCHEMA statement.

Thanks!

Please to send future patches to this thread so people can track them
in their mail.

I'll do.

I didn't for the previous one because it was a few months ago, and no patch
had been added to the commit fest.

Thanks for adding this one :)

The import_foreign_schema_postgres_fdw patch proposes an implementation of
this API for postgres_fdw. It will import a foreign schema using the right
types as well as nullable information.

In the case of PostgreSQL, "the right types" are obvious until there's
a user-defined one. What do you plan to do in that case ?

The current implementation fetches the types as regtype, and when receiving a
custom type, two things can happen:

- the type is defined locally: everything will work as expected
- the type is not defined locally: the conversion function will fail, and
raise an error of the form: ERROR: type "schema.typname" does not exist

Should I add that to the regression test suite ?

Yes.

In the "easy" case of PostgreSQL, you might also be able to establish
whether the UDT in the "already defined locally" case above is
identical to the one defined remotely, but I don't think it's possible
even in principle for non-PostgreSQL remote systems, possibly not even
for ones with non-identical architecture, PostgreSQL major version,
etc.

Regarding documentation, I don't really know where it should have been
put. If I missed something, let me know and I'll try to correct it.

It's not exactly something you missed, but I need to bring it up
anyway before we go too far. The standard missed two crucial concepts
when this part of it was written:

1. No single per-database-type universal type mapping can be correct.

[snip]

To address these problems, I propose the following:

- We make type mappings settable at the level of:
- FDW
- Instance (a.k.a. cluster)
- Database
- Schema
- Table
- Column

using the existing ALTER command and some way of spelling out how

Oops. Forgot to include CREATE in the above.

a remote type maps to a local type.

This would consist of:
- The remote type
- The local type to which it maps
- The inbound transformation (default identity)
- The outbound transformation (default identity)

At any given level, the remote type would need to be unique. To
communicate this to the system, we either invent new syntax, with
all the hazards attendant thereto, or we could use JSON or similar
serialization.

ALTER FOREIGN TABLE foo
ADD TYPE MAPPING
FROM "datetime"
TO TEXT
WITH (
INBOUND TRANSFORMATION IDENTITY,
OUTBOUND TRANSFORMATION IDENTITY
) /* Ugh!!! */

"Ugh!!!" means I don't think we should do it this way.

vs.

ALTER FOREIGN TABLE foo ADD (mapping '{
"datetime": "text",
"inbound": "IDENTITY",
outbound: "IDENTITY"
}')

Each FDW would have some set of default mappings and some way to
override them as above.

I understand your points, but I'm not really comfortable with the
concept, unless there is something that I missed.

My poor communication ability might have a lot to do with it. I
assure you my explanation would have been even worse if I had tried it
in French, though. :P

We can already support this use case through specific-fdw options. Should I
add that to postgres_fdw ?

I believe the capability belongs in our FDW API with the decision of
whether to implement it up to FDW authors. They know (or should know)
how to throw ERRCODE_FEATURE_NOT_SUPPORTED.

Additionally, I don't really see how that would be useful in a general case.
With an "in-core" defined meaning of type transformation, any FDW that doesn't
fit exactly into the model would have a hard time. For example, what happens
if an FDW is only ever capable of returning text ?

That's actually the case where it's most important to have the feature
all the way down to the column level.

Or if a mapping can only be set at the server or FDW model because
it depends on some connection parameter ?

ERRCODE_FEATURE_NOT_SUPPORTED.

The bulk of the code for managing type mappings would be
FDW-specific anyway.

The part that actually does the transformations would necessarily be
part of each FDW. I omitted opining on whether such transformations
should be assumed to be local or remote because I can imagine cases
where only local (or only remote) could be correct.

What you're proposing looks like a "universal option", with a
specific syntax, that should therefore be supported by all fdws,
with well-defined semantics.

At the DDL level, yes.

Moreover, extending the spec seems a bit dangerous to me, since if the spec
decides to address this specific point in the future, there is a risk that our
behavior will not be compatible.

That's why I suggested doing it via CREATE/ALTER with JSONB or similar
containing the details rather than inventing new SQL grammar, an
option I included only to dismiss it.

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Laurenz Albe

laurenz.albe@cybertec.at

about 12 years ago

In reply to: Ronan Dunklau (#1)

Re: IMPORT FOREIGN SCHEMA statement

Ronan Dunklau wrote:

Since my last proposal didn't get any strong rebuttal, please find attached a
more complete version of the IMPORT FOREIGN SCHEMA statement.

I tried to follow the SQL-MED specification as closely as possible.

This adds discoverability to foreign servers. The structure of the statement
as I understand it is simple enough:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO |
EXCEPT) table_list ] INTO local_schema.

The import_foreign_schema patch adds the infrastructure, and a new FDW
routine:

typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
ImportForeignSchemaStmt * parsetree);

This routine must return a list of CreateForeignTableStmt mirroring whatever
tables were found on the remote side, which will then be executed.

In addition to data type mapping questions (which David already raised)
I have one problem when I think of the Oracle FDW:

Oracle follows the SQL standard in folding table names to upper case.
So this would normally result in a lot of PostgreSQL foreign tables
with upper case names, which would be odd and unpleasant.

I cannot see a way out of that, but I thought I should mention it.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Laurenz Albe (#5)

Re: IMPORT FOREIGN SCHEMA statement

Albe Laurenz <laurenz.albe@wien.gv.at> writes:

In addition to data type mapping questions (which David already raised)
I have one problem when I think of the Oracle FDW:

Oracle follows the SQL standard in folding table names to upper case.
So this would normally result in a lot of PostgreSQL foreign tables
with upper case names, which would be odd and unpleasant.

I cannot see a way out of that, but I thought I should mention it.

It seems like it would often be desirable for the Oracle FDW to smash
all-upper-case names to all-lower-case while importing, so that no quoting
is needed on either side. I doubt though that this is so desirable that
it should happen unconditionally.

Between this and the type-mapping questions, it seems likely that
we're going to need a way for IMPORT FOREIGN SCHEMA to accept
user-supplied control options, which would in general be specific
to the FDW being used. (Another thing the SQL committee failed to
think about.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ronan Dunklau

ronan.dunklau@dalibo.com

about 12 years ago

In reply to: Tom Lane (#6)

Re: IMPORT FOREIGN SCHEMA statement

Additionally, I don't really see how that would be useful in a general

case.

With an "in-core" defined meaning of type transformation, any FDW that

doesn't

fit exactly into the model would have a hard time. For example, what

happens

if an FDW is only ever capable of returning text ?

That's actually the case where it's most important to have the feature
all the way down to the column level.

I'm not sure configuration about specific columns from specific tables would
be that useful: if you already know the structure of the table, you can either
create it yourself, or run an alter column statement just after creating it.

Alternativeley, with the arbitrary options clause proposed by Tom and detailed
below, one could use the LIMIT TO / EXCEPT clauses to fine-tune what options
should apply.

That's why I suggested doing it via CREATE/ALTER with JSONB or similar
containing the details rather than inventing new SQL grammar, an
option I included only to dismiss it.

Hum, adding a simple TYPE MAPPING is already inventing new SQL grammar, but
its less invasive.

Between this and the type-mapping questions, it seems likely that
we're going to need a way for IMPORT FOREIGN SCHEMA to accept
user-supplied control options, which would in general be specific
to the FDW being used. (Another thing the SQL committee failed to
think about.)

So, without extending the syntax too much, we could add options:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER server_name INTO local_schema
[ { LIMIT TO | EXCEPT } (table_name [, ...])]
[ OPTIONS (option_list) ]

This option list could then contain fdw-specific options, including type
mapping.

Or we could add a specific clause:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER server_name INTO local_schema
[ { LIMIT TO | EXCEPT } (table_name [, ...])]
[ OPTIONS (option_list) ]
[ TYPE MAPPING some mapping_definition ]

With mapping_definition being either a tuple, or well-defined jsonb format
common accross FDWs.

A third solution, which I don't like but doesn't modify the SQL grammar, would
be to encode the options in the remote_schema name.

Would one of those solutions be acceptable ?

--
Ronan Dunklau
http://dalibo.com - http://dalibo.org

Stephen Frost

sfrost@snowman.net

about 12 years ago

In reply to: David Fetter (#2)

Re: IMPORT FOREIGN SCHEMA statement

* David Fetter (david@fetter.org) wrote:

- We make type mappings settable at the level of:
- FDW
- Instance (a.k.a. cluster)
- Database
- Schema
- Table
- Column

While I like the general idea, you seem to be making this appear much
more complex than it actually is. For example, I see no value in a
table-level "uint4 -> int8" definition. If you want something which is
not the default, just set it on the individual columns of the foreign
table, exactly how we handle column name differences.

There might be some value in being able to redefine what the default is
at the FOREIGN SERVER level, as perhaps MySQL DB X and MySQL DB Y could
have different default mappings for some reason, but adding in the rest
of those levels doesn't add value imv.

This would consist of:
- The remote type
- The local type to which it maps
- The inbound transformation (default identity)
- The outbound transformation (default identity)

The remote type and the local type are known already to the FDW, no?
The FDW will need to know how to do whatever conversions we're talking
about also, right? (If you want to do it in core PG instead of the FDW
then I'm thinking you should probably use a view over top of the
foreign table..). What's nice is that this all falls under what an FDW
can do *already*, if it wants (and, actually, it could do it on the
table-level technically too, if the FDW supports that, but schema?
database? these things don't make sense...).

For the IMPORT bit, it should just be doing whatever the defaults are
unless you've set some different defaults for a given foreign server
(also something which could be set using our existing system...).

ALTER FOREIGN TABLE foo ADD (mapping '{
"datetime": "text",
"inbound": "IDENTITY",
outbound: "IDENTITY"
}')

I'm very much against having two different command languages where one
is used "when convenient". If we wanted to do this, we should build a
full-spec mapping from whatever JSON language you come up with for our
utility commands to what we actually implement in the grammar.

Thanks,

Stephen

Stephen Frost

sfrost@snowman.net

about 12 years ago

In reply to: David Fetter (#4)

Re: IMPORT FOREIGN SCHEMA statement

* David Fetter (david@fetter.org) wrote:

In the "easy" case of PostgreSQL, you might also be able to establish
whether the UDT in the "already defined locally" case above is
identical to the one defined remotely, but I don't think it's possible
even in principle for non-PostgreSQL remote systems, possibly not even
for ones with non-identical architecture, PostgreSQL major version,
etc.

For my 2c, it'd be very handy if IMPORT had an option or similar to also
copy over all (in the schema) or at least any depended-upon UDTs. It'd
really be nice if we had an ability to check the remote UDT vs. the
local one at some point also, since otherwise you're going to get bitten
at run-time when querying the foreign table.

Thanks,

Stephen

#10

Stephen Frost

sfrost@snowman.net

about 12 years ago

In reply to: Tom Lane (#6)

Re: IMPORT FOREIGN SCHEMA statement

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Albe Laurenz <laurenz.albe@wien.gv.at> writes:

Oracle follows the SQL standard in folding table names to upper case.
So this would normally result in a lot of PostgreSQL foreign tables
with upper case names, which would be odd and unpleasant.

I cannot see a way out of that, but I thought I should mention it.

It seems like it would often be desirable for the Oracle FDW to smash
all-upper-case names to all-lower-case while importing, so that no quoting
is needed on either side. I doubt though that this is so desirable that
it should happen unconditionally.

The oracle FDW would simply need a foreign-server level option which
says "smash everything to lowercase on import".

Between this and the type-mapping questions, it seems likely that
we're going to need a way for IMPORT FOREIGN SCHEMA to accept
user-supplied control options, which would in general be specific
to the FDW being used. (Another thing the SQL committee failed to
think about.)

I can see value in having specific mappings defined at the foreign
server level for what IMPORT does by default, but as IMPORT is intended
to be a bulk process, I don't see the value in trying to teach it how to
do a specific mapping for a specific table or column inside the schema.
You run the IMPORT for the entire schema and then go fix up any special
cases or things you wanted differently.

That said, I'm not against having a way to pass to IMPORT a similar
key/value pair set which we already support for the FDW options on
tables, columns, etc, by way of safe-guarding any potential use case for
such.

Thanks,

Stephen

#11

Stephen Frost

sfrost@snowman.net

about 12 years ago

In reply to: Ronan Dunklau (#7)

Re: IMPORT FOREIGN SCHEMA statement

* Ronan Dunklau (ronan.dunklau@dalibo.com) wrote:

So, without extending the syntax too much, we could add options:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER server_name INTO local_schema
[ { LIMIT TO | EXCEPT } (table_name [, ...])]
[ OPTIONS (option_list) ]

This option list could then contain fdw-specific options, including type
mapping.

Yup, that looks reasonable to me.

Or we could add a specific clause:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER server_name INTO local_schema
[ { LIMIT TO | EXCEPT } (table_name [, ...])]
[ OPTIONS (option_list) ]
[ TYPE MAPPING some mapping_definition ]

-1 on this one.

With mapping_definition being either a tuple, or well-defined jsonb format
common accross FDWs.

A third solution, which I don't like but doesn't modify the SQL grammar, would
be to encode the options in the remote_schema name.

Yeah, don't like that one either.

Thanks,

Stephen

#12

Petr Jelinek

petr@2ndquadrant.com

about 12 years ago

In reply to: Ronan Dunklau (#7)

Re: IMPORT FOREIGN SCHEMA statement

On 27/05/14 13:49, Ronan Dunklau wrote:

Between this and the type-mapping questions, it seems likely that
we're going to need a way for IMPORT FOREIGN SCHEMA to accept
user-supplied control options, which would in general be specific
to the FDW being used. (Another thing the SQL committee failed to
think about.)

So, without extending the syntax too much, we could add options:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER server_name INTO local_schema
[ { LIMIT TO | EXCEPT } (table_name [, ...])]
[ OPTIONS (option_list) ]

This option list could then contain fdw-specific options, including type
mapping.

This one looks like good option to me.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Alvaro Herrera

alvherre@2ndquadrant.com

about 12 years ago

In reply to: Stephen Frost (#10)

Re: IMPORT FOREIGN SCHEMA statement

Stephen Frost wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Albe Laurenz <laurenz.albe@wien.gv.at> writes:

Oracle follows the SQL standard in folding table names to upper case.
So this would normally result in a lot of PostgreSQL foreign tables
with upper case names, which would be odd and unpleasant.

I cannot see a way out of that, but I thought I should mention it.

It seems like it would often be desirable for the Oracle FDW to smash
all-upper-case names to all-lower-case while importing, so that no quoting
is needed on either side. I doubt though that this is so desirable that
it should happen unconditionally.

The oracle FDW would simply need a foreign-server level option which
says "smash everything to lowercase on import".

That's not the same thing though -- consider what happens to the quoting
needs for names with mixed case. If you change mixed case to
all-lowercase, references to such objects using quotes in the
application code would fail because the name is now all-lowercase in
Postgres. A tri-valued enum could do the trick:
lowercase_names={wholly_uppercase_only, all, none}
with the first one being the most sensible and default.

Between this and the type-mapping questions, it seems likely that
we're going to need a way for IMPORT FOREIGN SCHEMA to accept
user-supplied control options, which would in general be specific
to the FDW being used. (Another thing the SQL committee failed to
think about.)

I can see value in having specific mappings defined at the foreign
server level for what IMPORT does by default, but as IMPORT is intended
to be a bulk process, I don't see the value in trying to teach it how to
do a specific mapping for a specific table or column inside the schema.
You run the IMPORT for the entire schema and then go fix up any special
cases or things you wanted differently.

Yes, it seems better to offer the ability to create transactional
scripts around IMPORT (so it must be able to run in a transaction block
-- IMPORT first, then do a bunch of ALTERs), than complicating IMPORT
itself infinitely to support hundreds of possible options.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Stephen Frost

sfrost@snowman.net

about 12 years ago

In reply to: Alvaro Herrera (#13)

Re: IMPORT FOREIGN SCHEMA statement

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

Stephen Frost wrote:

It seems like it would often be desirable for the Oracle FDW to smash
all-upper-case names to all-lower-case while importing, so that no quoting
is needed on either side. I doubt though that this is so desirable that
it should happen unconditionally.

The oracle FDW would simply need a foreign-server level option which
says "smash everything to lowercase on import".

That's not the same thing though -- consider what happens to the quoting
needs for names with mixed case. If you change mixed case to
all-lowercase, references to such objects using quotes in the
application code would fail because the name is now all-lowercase in
Postgres. A tri-valued enum could do the trick:
lowercase_names={wholly_uppercase_only, all, none}
with the first one being the most sensible and default.

Sure, I was being a bit over-simplistic. As was mentioned up-thread,
the option would rather be "flip all-uppercase to lowercase and all-
lowercase to uppercase, quote any which are mixed", or something along
those lines. What I was trying to get at is that it's up to the FDW
what options it wants to support in this regard and we already have a
way for the admin to pass in useful information to the FDW by way of the
FOREIGN SERVER FDW options.

This, plus the generic ability to pass an OPTIONS clause to the IMPORT
(allowing you to have different defaults for different IMPORT
statements) and having it be transactional, as you mention, appears to
be covering all the relevant bases.

Thanks,

stephen

#15

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Stephen Frost (#14)

Re: IMPORT FOREIGN SCHEMA statement

Stephen Frost <sfrost@snowman.net> writes:

Sure, I was being a bit over-simplistic. As was mentioned up-thread,
the option would rather be "flip all-uppercase to lowercase and all-
lowercase to uppercase, quote any which are mixed", or something along
those lines. What I was trying to get at is that it's up to the FDW
what options it wants to support in this regard and we already have a
way for the admin to pass in useful information to the FDW by way of the
FOREIGN SERVER FDW options.

This, plus the generic ability to pass an OPTIONS clause to the IMPORT
(allowing you to have different defaults for different IMPORT
statements) and having it be transactional, as you mention, appears to
be covering all the relevant bases.

Yeah, as far as the example of coping with differing case goes, I agree
that we'd want IMPORT to just follow whatever the FDW's default or
configured behavior is, since obviously the FDW will have to know how to
reverse the conversion when sending queries later on. So there's no
apparent need for an IMPORT-level option *for that example*. I'm not
sure if we need OPTIONS for IMPORT for any other uses.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Stephen Frost

sfrost@snowman.net

about 12 years ago

In reply to: Tom Lane (#15)

Re: IMPORT FOREIGN SCHEMA statement

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

This, plus the generic ability to pass an OPTIONS clause to the IMPORT
(allowing you to have different defaults for different IMPORT
statements) and having it be transactional, as you mention, appears to
be covering all the relevant bases.

Yeah, as far as the example of coping with differing case goes, I agree
that we'd want IMPORT to just follow whatever the FDW's default or
configured behavior is, since obviously the FDW will have to know how to
reverse the conversion when sending queries later on. So there's no
apparent need for an IMPORT-level option *for that example*. I'm not
sure if we need OPTIONS for IMPORT for any other uses.

I'm waffling a bit on this particular point. IMPORT is for bulk
operations, of course, but perhaps on one remote server you want to
import everything from the dimension tables using the default mapping
but everything from the fact or perhaps aggregate tables in some schema
as text. You could do that with something like- import #1, change the
server-level options, import #2, or you could do just one big import and
then go back and fix everything, but...

I'd rather introduce this options clause for IMPORT *now*, which means
we don't need to care about if this specific example is really relevant
or not, than not have it in 9.5 and have FDW authors unhappy because
it's missing (whether they need it for this use case or some other).

Thanks,

Stephen

#17

Ronan Dunklau

ronan.dunklau@dalibo.com

about 12 years ago

In reply to: Stephen Frost (#9)

Re: IMPORT FOREIGN SCHEMA statement

Le mardi 27 mai 2014 09:52:27 Stephen Frost a écrit :

* David Fetter (david@fetter.org) wrote:

In the "easy" case of PostgreSQL, you might also be able to establish
whether the UDT in the "already defined locally" case above is
identical to the one defined remotely, but I don't think it's possible
even in principle for non-PostgreSQL remote systems, possibly not even
for ones with non-identical architecture, PostgreSQL major version,
etc.

For my 2c, it'd be very handy if IMPORT had an option or similar to also
copy over all (in the schema) or at least any depended-upon UDTs. It'd
really be nice if we had an ability to check the remote UDT vs. the
local one at some point also, since otherwise you're going to get bitten
at run-time when querying the foreign table.

The specification only talks about importing tables, not types, views or (why
not ?) foreign tables.

If we want to extend the spec further, we could accept more than
CreateForeignTableStmt as return values from the API:
- CreateDomainStmt
- CompositeTypeStmt
- AlterTableCmd

The core code would be responsible for executing them, and making sure the
destination schema is correctly positioned on all of those.

The postgres_fdw behaviour could then be controlled with options such as
include_types (tri-valued enum accepting "none", "all", "as_needed"),
relation_kinds (default to 'r', can support multiple kinds with a string 'rfv'
for tables, foreign tables and views).

I think we're drifting further away from the standard with that kind of stuff,
but I'd be happy to implement it if that's the path we choose.

--
Ronan Dunklau
http://dalibo.com - http://dalibo.org

#18

David Fetter

david@fetter.org

about 12 years ago

In reply to: Stephen Frost (#8)

Re: IMPORT FOREIGN SCHEMA statement

On Tue, May 27, 2014 at 09:41:06AM -0400, Stephen Frost wrote:

* David Fetter (david@fetter.org) wrote:

- We make type mappings settable at the level of:
- FDW
- Instance (a.k.a. cluster)
- Database
- Schema
- Table
- Column

While I like the general idea, you seem to be making this appear much
more complex than it actually is. For example, I see no value in a
table-level "uint4 -> int8" definition.

You chose a particularly silly example, so I have to assume it's a
straw man. My point was that since we don't know what things are
relevant to preserve and transform here in the design stage, we need
to leave them settable by people who have needs we don't know about,
i.e. the users of the feature.

If you want something which is
not the default, just set it on the individual columns of the foreign
table, exactly how we handle column name differences.

There might be some value in being able to redefine what the default is
at the FOREIGN SERVER level, as perhaps MySQL DB X and MySQL DB Y could
have different default mappings for some reason, but adding in the rest
of those levels doesn't add value imv.

This would consist of:
- The remote type
- The local type to which it maps
- The inbound transformation (default identity)
- The outbound transformation (default identity)

The remote type and the local type are known already to the FDW, no?

The above aren't separable items. They all have to be there, even if
for user convenience we have two default transformations which are the
identity.

The FDW will need to know how to do whatever conversions we're talking
about also, right?

No. The writer of the FDW is not the same person as the user of the
FDW, and the former is not omniscient. My idea here is to allow FDW
users to supply transformation code at these spots.

(If you want to do it in core PG instead of the FDW
then I'm thinking you should probably use a view over top of the
foreign table..).

We can't know in advance what to preserve and what to jettison. We
can't even know which server is responsible for doing the
transformation.

What's nice is that this all falls under what an FDW
can do *already*, if it wants (and, actually, it could do it on the
table-level technically too, if the FDW supports that, but schema?
database? these things don't make sense...).

Not to you yet, but I've seen deployed applications with exactly these
requirements.

For the IMPORT bit, it should just be doing whatever the defaults are
unless you've set some different defaults for a given foreign server
(also something which could be set using our existing system...).

ALTER FOREIGN TABLE foo ADD (mapping '{
"datetime": "text",
"inbound": "IDENTITY",
outbound: "IDENTITY"
}')

I'm very much against having two different command languages where one
is used "when convenient".

I did not suggest that we use two different ways to specify this. I
did sketch out one path--adding a bunch of SQL grammar--which I made
it explicitly clear we shouldn't tread. Apparently, I wasn't quite
explicit enough.

If we wanted to do this, we should build a full-spec mapping from
whatever JSON language you come up with for our utility commands to
what we actually implement in the grammar.

Excellent plan.

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Stephen Frost

sfrost@snowman.net

about 12 years ago

In reply to: David Fetter (#18)

Re: IMPORT FOREIGN SCHEMA statement

* David Fetter (david@fetter.org) wrote:

On Tue, May 27, 2014 at 09:41:06AM -0400, Stephen Frost wrote:

* David Fetter (david@fetter.org) wrote:

- We make type mappings settable at the level of:
- FDW
- Instance (a.k.a. cluster)
- Database
- Schema
- Table
- Column

While I like the general idea, you seem to be making this appear much
more complex than it actually is. For example, I see no value in a
table-level "uint4 -> int8" definition.

You chose a particularly silly example, so I have to assume it's a
straw man.

... I used your example from 20140525194118.GB32355@fetter.org and your
suggestion that we support that on a per-table basis. If that
combination is silly then let's not support it.

My point was that since we don't know what things are
relevant to preserve and transform here in the design stage, we need
to leave them settable by people who have needs we don't know about,
i.e. the users of the feature.

This is not relevant as far as I can tell. Users can already make
changes to the definitions, or create them however they want the first
time by using CREATE FOREIGN TABLE. The point of IMPORT is that it
should be for bulk operations- if you have a lot of very specific
transformations, then use CREATE instead. I don't see value in
rebuilding the flexibility of what a script of CREATEs gives you into
a single IMPORT command.

The FDW will need to know how to do whatever conversions we're talking
about also, right?

No. The writer of the FDW is not the same person as the user of the
FDW, and the former is not omniscient. My idea here is to allow FDW
users to supply transformation code at these spots.

Then we're not talking about something which should be IMPORT's job, or
at least it goes far beyond it and that's what we're discussing here.
This sounds a bit like you're looking to rebuild what ETLs provide in a
streaming fashion- and I'm all for that as an interesting project that
isn't about adding IMPORT. I still think you could use views for this,
or ETL, if you really want to implement it using core instead of the
FDW.

(If you want to do it in core PG instead of the FDW
then I'm thinking you should probably use a view over top of the
foreign table..).

We can't know in advance what to preserve and what to jettison. We
can't even know which server is responsible for doing the
transformation.

Either side could handle the transformation using views, or there could
be transformations on both sides.

What's nice is that this all falls under what an FDW
can do *already*, if it wants (and, actually, it could do it on the
table-level technically too, if the FDW supports that, but schema?
database? these things don't make sense...).

Not to you yet, but I've seen deployed applications with exactly these
requirements.

I don't view IMPORT as being part of a deployed application. It's a way
of simplifying the process of setting up FDWs, not an ETL mechanism.

For the IMPORT bit, it should just be doing whatever the defaults are
unless you've set some different defaults for a given foreign server
(also something which could be set using our existing system...).

ALTER FOREIGN TABLE foo ADD (mapping '{
"datetime": "text",
"inbound": "IDENTITY",
outbound: "IDENTITY"
}')

I'm very much against having two different command languages where one
is used "when convenient".

I did not suggest that we use two different ways to specify this. I
did sketch out one path--adding a bunch of SQL grammar--which I made
it explicitly clear we shouldn't tread. Apparently, I wasn't quite
explicit enough.

What you missed here is that I'd find it objectionable to have some
portion of the system has a JSON-based grammar while the rest is an SQL
grammar. I'm not entirely thrilled with the jsquery approach for that
reason, but at least that's a very contained case which is about a logic
expression (and we have logic expressions already in SQL). That's not
what you're describing here.

If we wanted to do this, we should build a full-spec mapping from
whatever JSON language you come up with for our utility commands to
what we actually implement in the grammar.

Excellent plan.

Go for it. From here, I don't see anything stopping you from making a
wrapper around PG which converts your JSON syntax into SQL (indeed,
people have already done this..). I wouldn't get your hopes up about
having it ever part of core PG though, and it certainly won't be until
it's as complete and capable as the existing SQL grammar.

Thanks,

Stephen

#20

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Ronan Dunklau (#3)

Re: IMPORT FOREIGN SCHEMA statement

On Mon, May 26, 2014 at 6:23 AM, Ronan Dunklau <ronan.dunklau@dalibo.com> wrote:

Le dimanche 25 mai 2014 12:41:18 David Fetter a écrit :

On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote:

Hello,

Since my last proposal didn't get any strong rebuttal, please find
attached a more complete version of the IMPORT FOREIGN SCHEMA statement.

Thanks!

Please to send future patches to this thread so people can track them
in their mail.

I'll do.

I didn't for the previous one because it was a few months ago, and no patch
had been added to the commit fest.

I tried to follow the SQL-MED specification as closely as possible.

This adds discoverability to foreign servers. The structure of the
statement as I understand it is simple enough:

IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO |
EXCEPT) table_list ] INTO local_schema.

The import_foreign_schema patch adds the infrastructure, and a new FDW
routine:

typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
ImportForeignSchemaStmt * parsetree);

This routine must return a list of CreateForeignTableStmt mirroring
whatever tables were found on the remote side, which will then be
executed.

The import_foreign_schema_postgres_fdw patch proposes an implementation of
this API for postgres_fdw. It will import a foreign schema using the right
types as well as nullable information.

In the case of PostgreSQL, "the right types" are obvious until there's
a user-defined one. What do you plan to do in that case ?

The current implementation fetches the types as regtype, and when receiving a
custom type, two things can happen:

- the type is defined locally: everything will work as expected
- the type is not defined locally: the conversion function will fail, and
raise an error of the form: ERROR: type "schema.typname" does not exist

Just wondering: what about the case where the same data type is
defined on both local and remote, but with *different* definitions? Is
it the responsibility of the fdw to check for type incompatibilities?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Atri Sharma

atri.jiit@gmail.com

almost 12 years ago

In reply to: Michael Paquier (#20)

#22

Ronan Dunklau

ronan.dunklau@dalibo.com

almost 12 years ago

In reply to: Atri Sharma (#21)

#23

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Ronan Dunklau (#1)

#24

Stephen Frost

sfrost@snowman.net

almost 12 years ago

In reply to: Ronan Dunklau (#22)

#25

Ronan Dunklau

ronan.dunklau@dalibo.com

almost 12 years ago

In reply to: Michael Paquier (#23)

#26

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Ronan Dunklau (#25)

#27

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Michael Paquier (#26)

#28

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Michael Paquier (#26)

#29

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Michael Paquier (#28)

#30

Ronan Dunklau

ronan.dunklau@dalibo.com

almost 12 years ago

In reply to: Michael Paquier (#29)

#31

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Ronan Dunklau (#30)

#32

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Michael Paquier (#31)

#33

Laurenz Albe

laurenz.albe@cybertec.at

almost 12 years ago

In reply to: Michael Paquier (#32)

#34

Ronan Dunklau

ronan.dunklau@dalibo.com

almost 12 years ago

In reply to: Laurenz Albe (#33)

#35

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Ronan Dunklau (#34)

#36

Ronan Dunklau

ronan.dunklau@dalibo.com

almost 12 years ago

In reply to: Michael Paquier (#35)

#37

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Ronan Dunklau (#36)

#38

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Tom Lane (#6)

#39

Ronan Dunklau

ronan.dunklau@dalibo.com

almost 12 years ago

In reply to: Robert Haas (#38)

#40

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Michael Paquier (#37)

#41

Alvaro Herrera

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Tom Lane (#40)

#42

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Tom Lane (#40)

#43

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Tom Lane (#40)

#44

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Tom Lane (#40)

#45

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Tom Lane (#44)

#46

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Michael Paquier (#45)

#47

Ronan Dunklau

ronan.dunklau@dalibo.com

almost 12 years ago

In reply to: Tom Lane (#46)

#48

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Tom Lane (#46)

#49

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Ronan Dunklau (#47)

#50

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Tom Lane (#48)

#51

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Tom Lane (#50)

IMPORT FOREIGN SCHEMA statement

Attachments: