Handling changes to default type transformations in PLs

Started by Jim Nasbyabout 10 years ago4 messageshackers

Jim.Nasby@BlueTreble.com

about 10 years ago

Some of our PLs have the unfortunate problem of making a weak effort
with converting types to and from the PL and Postgres. For example,
plpythonu will correctly transform a complex type to a dict and an array
to a list, but it punts back to text for an array contained inside a
complex type. I know plTCL has similar issues; presumably other PLs are
affected as well.

While it's a SMOC to fix this, what's not simple is the backwards
compatability: users that are currently using these types are expecting
to be handed strings created by the type's output function, so we can't
just drop these changes in without breaking user code.

It might be possible to work around this with TRANSFORMs, but that's
just ugly: first you'd have to write a bunch of non-trivial C code, then
you'd need to forever remember to specify TRANSFORM FOR TYPE blah.

Some ways to handle this:

1) Use a PL-specific GUC for each case where we need backwards
compatibility. For plpython we'd need 2. plTCL would need 1 or 2.

2) Use a single all-or-nothing GUC. Downside is that if we later decide
to expand automatic conversion again we'd need yet another GUC.

3) Add the concept of PL API versions. This would allow users to specify
what range of API versions they support. I think this would have been
helpful with the plpython elog() patch.

4) Create a mechanism for specifying default TRANSFORMs for a PL, and
essentially "solve" these issues by supplying a built-in transform.

I think default transforms (4) are worth doing no matter what. Having to
manually remember to add potentially multiple TRANSFORMs is a PITA. But
I'm not sure TRANSFORMS would actually fix all issues. For example, you
can't specify a transform for an array type, so this probably wouldn't
work for one of the plpython problems.

3 is interesting, but maybe it would be bad to tie multiple unrelated
API changes together.

So I'm leaning towards 1. It means potentially adding a fair number of
new GUCs, but these would all be custom GUCs, so maybe it's not that
bad. The other downside to GUCs is I think it'd be nice to be able to
set this at a schema level, which you can't currently do with GUCs.

Thoughts?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 10 years ago

In reply to: Jim Nasby (#1)

Re: Handling changes to default type transformations in PLs

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

Some of our PLs have the unfortunate problem of making a weak effort
with converting types to and from the PL and Postgres. For example,
plpythonu will correctly transform a complex type to a dict and an array
to a list, but it punts back to text for an array contained inside a
complex type. I know plTCL has similar issues; presumably other PLs are
affected as well.

While it's a SMOC to fix this, what's not simple is the backwards
compatability: users that are currently using these types are expecting
to be handed strings created by the type's output function, so we can't
just drop these changes in without breaking user code.

It might be possible to work around this with TRANSFORMs, but that's
just ugly: first you'd have to write a bunch of non-trivial C code, then
you'd need to forever remember to specify TRANSFORM FOR TYPE blah.

Some ways to handle this:

1) Use a PL-specific GUC for each case where we need backwards
compatibility. For plpython we'd need 2. plTCL would need 1 or 2.

2) Use a single all-or-nothing GUC. Downside is that if we later decide
to expand automatic conversion again we'd need yet another GUC.

3) Add the concept of PL API versions. This would allow users to specify
what range of API versions they support. I think this would have been
helpful with the plpython elog() patch.

4) Create a mechanism for specifying default TRANSFORMs for a PL, and
essentially "solve" these issues by supplying a built-in transform.

I think default transforms (4) are worth doing no matter what. Having to
manually remember to add potentially multiple TRANSFORMs is a PITA. But
I'm not sure TRANSFORMS would actually fix all issues. For example, you
can't specify a transform for an array type, so this probably wouldn't
work for one of the plpython problems.

3 is interesting, but maybe it would be bad to tie multiple unrelated
API changes together.

So I'm leaning towards 1. It means potentially adding a fair number of
new GUCs, but these would all be custom GUCs, so maybe it's not that
bad. The other downside to GUCs is I think it'd be nice to be able to
set this at a schema level, which you can't currently do with GUCs.

Thoughts?

I think harsh experience has taught us to distrust GUCs that change
code semantics. So I'm not very attracted by option #1, much less
option #2. I'm not sure about option #4 --- it smells like it would
have the same problems as a GUC, namely that it would be
action-at-a-distance on the semantics of a PL function's arguments,
with insufficient ability to control the scope of the effects.

So that leaves #3, which doesn't seem all that unreasonable from here.
We don't have a problem with bundling a bunch of unrelated changes
into any one major PG revision. The scripting languages we're talking
about calling do similar things. So why not for the semantics of the
glue layer?

It seems like you really need to be able to specify this at the
per-function level, which makes me think that specifying
"LANGUAGE plpython_2" or "LANGUAGE plperl_3" might be the right
kind of API.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

about 10 years ago

In reply to: Tom Lane (#2)

Re: Handling changes to default type transformations in PLs

3) Add the concept of PL API versions. This would allow users to specify

So that leaves #3, which doesn't seem all that unreasonable from here.
We don't have a problem with bundling a bunch of unrelated changes
into any one major PG revision. The scripting languages we're talking
about calling do similar things. So why not for the semantics of the
glue layer?

It seems like you really need to be able to specify this at the
per-function level, which makes me think that specifying
"LANGUAGE plpython_2" or "LANGUAGE plperl_3" might be the right
kind of API.

I am not big fan of this proposal. A users usually would to choose only
some preferred features - and this design has maybe too small granularity.

Objections:

* usually is used keyword REVISON - so syntax can be LANGUAGE plpython
REVISION 3. It is more readable. You need to specify preferred revision for
any language. The revision is persistent. The behave is same like Tom's
proposal, but I hope so this can be better readable and understandable

regards

Pavel

Show quoted text

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

about 10 years ago

In reply to: Jim Nasby (#1)

Re: Handling changes to default type transformations in PLs

4) Create a mechanism for specifying default TRANSFORMs for a PL, and
essentially "solve" these issues by supplying a built-in transform.

I think default transforms (4) are worth doing no matter what. Having to
manually remember to add potentially multiple TRANSFORMs is a PITA. But I'm
not sure TRANSFORMS would actually fix all issues. For example, you can't
specify a transform for an array type, so this probably wouldn't work for
one of the plpython problems.

yesterday, I wrote some doc about TRANSFORMS - and it is not possible. The
transformation cannot be transparent for PL function - because you have to
have information, what is your parameters inside function, and if these
parameters are original or transformed. This is not a issue for "smart"
object types, but generally it should not be ensured for basic types. So
default transformation is not a good idea. You can have a transform for C
language and it is really different from Python.

Maybe concept of TRANSFORMs too general and some specific PL extension's
can be better. It needs some concept of persistent options, that can be
used in these extension.

CREATE OR REPLACE FUNCTION .... LANGUAGE plpython WITH OPTIONS
('transform_dictionary', ...)

The execution should be stopped if any option will not be processed.

With these extensions you can do anything and It can works (I hope). But
the complexity of our PL will be significantly higher. And then Tom's
proposal can be better. It can help with faster adoption of new features
and it is relative simple solution.

Regards

Pavel