Identifying no-op length coercions
I'd like to revive the discussion that arose during the last CommitFest over
the proper design for identifying no-op length coercions like varchar(4) ->
varchar(8). Here is the root of the original debate:
http://archives.postgresql.org/message-id/20110109220353.GD5777@tornado.leadboat.com
There were two proposals on the table:
1. Attach a "f(from_typmod, to_typmod, is_explicit) RETURNS boolean" function
to the pg_cast; call it in find_coercion_pathway()
2. Attach a "f(FuncExpr) RETURNS Expr" (actually internal/internal) function
to the pg_proc; call it in simplify_function()
I tried and failed to write a summary of the respective arguments that could
legitimately substitute for (re-)reading the original thread, so I haven't
included one. I myself find the advantages of #2 mildly more compelling.
As previously noted, if we enrich the typmod system, only #1 will require
interface changes. One of the suggestions that came up before was designing
the typmod enrichment before designing this. Unless someone has short-term
plans to prepare that design, I'd like to avoid serializing on it -- it may
never happen. One thought: if #1 proves preferable independent of this
concern, should we consider implementing it without documenting it or exposing
it at the SQL level? External type developers could still directly update
pg_cast, so long as they'll be happy to keep both pieces if we someday change
the required function signature.
Can we reach a design that nobody dislikes excessively?
Thanks,
nm
On Mon, May 23, 2011 at 12:42 PM, Noah Misch <noah@leadboat.com> wrote:
I'd like to revive the discussion that arose during the last CommitFest over
the proper design for identifying no-op length coercions like varchar(4) ->
varchar(8). Here is the root of the original debate:http://archives.postgresql.org/message-id/20110109220353.GD5777@tornado.leadboat.com
There were two proposals on the table:
1. Attach a "f(from_typmod, to_typmod, is_explicit) RETURNS boolean" function
to the pg_cast; call it in find_coercion_pathway()
2. Attach a "f(FuncExpr) RETURNS Expr" (actually internal/internal) function
to the pg_proc; call it in simplify_function()I tried and failed to write a summary of the respective arguments that could
legitimately substitute for (re-)reading the original thread, so I haven't
included one. I myself find the advantages of #2 mildly more compelling.
The main reason I preferred #1 is that it would only get invoked in
the case of casts, whereas #2 would get invoked for all function
calls. For us to pay that overhead, there has to be some use case,
and I didn't find the examples that were offered very compelling. How
many CPU cycles are we willing to spend trying to simplify x + 0 to
just x, or x * 0 to just 0? I might be missing something here, but
those strikes me as textbook examples of cases where it's not worth
slowing down the whole system to fix a query that the user could have
easily written in some less-pathological way to begin with.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, May 23, 2011 at 12:42 PM, Noah Misch <noah@leadboat.com> wrote:
There were two proposals on the table:
1. Attach a "f(from_typmod, to_typmod, is_explicit) RETURNS boolean" function
� to the pg_cast; call it in find_coercion_pathway()
2. Attach a "f(FuncExpr) RETURNS Expr" (actually internal/internal) function
� to the pg_proc; call it in simplify_function()I tried and failed to write a summary of the respective arguments that could
legitimately substitute for (re-)reading the original thread, so I haven't
included one. �I myself find the advantages of #2 mildly more compelling.
The main reason I preferred #1 is that it would only get invoked in
the case of casts, whereas #2 would get invoked for all function
calls. For us to pay that overhead, there has to be some use case,
and I didn't find the examples that were offered very compelling.
Well, as I pointed out in
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02570.php
a hook function attached to pg_proc entries would cost nothing
measurable when not used. You could possibly make the same claim
for attaching the hook to pg_cast entries, if you cause the optimization
to occur during initial cast lookup rather than expression
simplification. But I remain of the opinion that that's the wrong place
to put it.
How
many CPU cycles are we willing to spend trying to simplify x + 0 to
just x, or x * 0 to just 0?
I'm not sure that's worthwhile either, but it was an example of a
possible future use-case rather than something that anybody was
proposing doing right now. Even though I tend to agree that it wouldn't
be worth looking for such cases with simple numeric datatypes, it's not
that hard to believe that someone might want the ability for complicated
datatypes with expensive operations.
In the short term, the only actual cost we'd incur is that we'd be
bloating pg_proc instead of pg_cast with an extra column, and there's
more rows in pg_proc. But pg_proc rows are already pretty darn wide.
regards, tom lane
On Mon, May 23, 2011 at 1:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, May 23, 2011 at 12:42 PM, Noah Misch <noah@leadboat.com> wrote:
There were two proposals on the table:
1. Attach a "f(from_typmod, to_typmod, is_explicit) RETURNS boolean" function
to the pg_cast; call it in find_coercion_pathway()
2. Attach a "f(FuncExpr) RETURNS Expr" (actually internal/internal) function
to the pg_proc; call it in simplify_function()I tried and failed to write a summary of the respective arguments that could
legitimately substitute for (re-)reading the original thread, so I haven't
included one. I myself find the advantages of #2 mildly more compelling.The main reason I preferred #1 is that it would only get invoked in
the case of casts, whereas #2 would get invoked for all function
calls. For us to pay that overhead, there has to be some use case,
and I didn't find the examples that were offered very compelling.Well, as I pointed out in
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02570.php
a hook function attached to pg_proc entries would cost nothing
measurable when not used. You could possibly make the same claim
for attaching the hook to pg_cast entries, if you cause the optimization
to occur during initial cast lookup rather than expression
simplification. But I remain of the opinion that that's the wrong place
to put it.
So you said here:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02575.php
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02585.php
The trouble is, I still can't see why type OIDs and typemods should be
handled differently. Taking your example again:
CREATE TABLE base (f1 varchar(4));
CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);
Your claim on the thread is that we want to someday allow this case.
But what if the last statement were instead:
ALTER TABLE base ALTER COLUMN f1 TYPE integer;
Should it also be our goal to handle that case? If not, why are they different?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, May 23, 2011 at 1:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... But I remain of the opinion that that's the wrong place
to put it.
So you said here:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02575.php
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02585.php
The trouble is, I still can't see why type OIDs and typemods should be
handled differently. Taking your example again:
CREATE TABLE base (f1 varchar(4));
CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);
Your claim on the thread is that we want to someday allow this case.
But what if the last statement were instead:
ALTER TABLE base ALTER COLUMN f1 TYPE integer;
Should it also be our goal to handle that case?
Maybe. But casts would be the least of our concerns if we were trying
to change the column type. Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does. In any case, the fact that the current implementation can't
readily support that is a poor excuse for building entirely new features
that also can't support it, when said features could easily be designed
without the restriction.
But more generally, I don't believe that you've made any positive case
whatever for doing it from pg_cast. It's not faster, it's not more
flexible, so why should we choose that approach?
regards, tom lane
On Mon, May 23, 2011 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe. But casts would be the least of our concerns if we were trying
to change the column type. Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does.
OK, this is the crucial point I was missing. Sorry for being a bit
fuzzy-headed about this.
My mental model of our type system, or of what a type system ought to
do, just doesn't match the type system we've got.
So let's do it the way you proposed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, May 23, 2011 at 02:11:39PM -0400, Robert Haas wrote:
On Mon, May 23, 2011 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe. ?But casts would be the least of our concerns if we were trying
to change the column type. ?Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does.OK, this is the crucial point I was missing. Sorry for being a bit
fuzzy-headed about this.My mental model of our type system, or of what a type system ought to
do, just doesn't match the type system we've got.So let's do it the way you proposed.
Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:
CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)
Both feel forced, to put it generously. Any better ideas? Worth adding a
keyword to get something decent?
Thanks,
nm
On Mon, May 23, 2011 at 2:46 PM, Noah Misch <noah@leadboat.com> wrote:
On Mon, May 23, 2011 at 02:11:39PM -0400, Robert Haas wrote:
On Mon, May 23, 2011 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe. ?But casts would be the least of our concerns if we were trying
to change the column type. ?Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does.OK, this is the crucial point I was missing. Sorry for being a bit
fuzzy-headed about this.My mental model of our type system, or of what a type system ought to
do, just doesn't match the type system we've got.So let's do it the way you proposed.
Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)Both feel forced, to put it generously. Any better ideas? Worth adding a
keyword to get something decent?
Do you have something specific in mind?
Just to throw out another few possibilities, how about INLINE FUNCTION
or ANALYZE FUNCTION?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Noah Misch <noah@leadboat.com> writes:
Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:
CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)
We could go with your previous idea of not bothering to expose this in
the SQL syntax. Given that the helper function is going to have a
signature along the lines of "(internal, internal) -> internal", it's
going to be difficult for anyone to use it for non-builtin functions
anyhow.
But if you really don't like that, what about
TRANSFORM helperfunctionname
Although TRANSFORM isn't currently a keyword for us, it is a
non-reserved keyword in SQL:2008, and it seems possible that we might
someday think about implementing the associated features.
regards, tom lane
On Mon, May 23, 2011 at 12:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Given that the helper function is going to have a
signature along the lines of "(internal, internal) -> internal", it's
going to be difficult for anyone to use it for non-builtin functions
anyhow.
I hate to go around in circles on this but I didn't see the original discussion.
This was the thing that concerned me. If anyone wants to add this
feature for a new data type they're going to have to understand and
tie their code to all this internal parser node stuff. That means
their code will be much more closely tied to a specific version, will
have to be written in C, and will require much more in-depth
understanding of Postgres internal data structures.
By comparison the boolean cast predicate could be written in any
language and only required the data type implementor to understand
their data type. It seems much more likely to actually get used and be
used correctly.
--
greg
Greg Stark <gsstark@mit.edu> writes:
This was the thing that concerned me. If anyone wants to add this
feature for a new data type they're going to have to understand and
tie their code to all this internal parser node stuff. That means
their code will be much more closely tied to a specific version, will
have to be written in C, and will require much more in-depth
understanding of Postgres internal data structures.
By comparison the boolean cast predicate could be written in any
language and only required the data type implementor to understand
their data type. It seems much more likely to actually get used and be
used correctly.
I don't think I believe that interesting length-conversion functions are
going to be written in anything but C anyway, so I don't find that that
argument holds much water. Generally, the low-level functions for a new
datatype have to be in C.
As for the other point, if all you want to do is examine the
expression's typmod, the API for that is pretty stable (exprTypmod()).
regards, tom lane
On Mon, May 23, 2011 at 02:53:01PM -0400, Robert Haas wrote:
On Mon, May 23, 2011 at 2:46 PM, Noah Misch <noah@leadboat.com> wrote:
On Mon, May 23, 2011 at 02:11:39PM -0400, Robert Haas wrote:
On Mon, May 23, 2011 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe. ?But casts would be the least of our concerns if we were trying
to change the column type. ?Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does.OK, this is the crucial point I was missing. ?Sorry for being a bit
fuzzy-headed about this.My mental model of our type system, or of what a type system ought to
do, just doesn't match the type system we've got.So let's do it the way you proposed.
Good deal. ?Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. ?Presumably, it will be
a new common_func_opt_item. ?When I last looked at the keywords list and tried
to come up with something, these were the best I could do:?CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
?CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)Both feel forced, to put it generously. ?Any better ideas? ?Worth adding a
keyword to get something decent?Do you have something specific in mind?
I had not. Having thought about it some, maybe "PLAN TRANSFORM FUNCTION". Do
we have a name for the pass over the tree rooted at eval_const_expressions()?
Just to throw out another few possibilities, how about INLINE FUNCTION
or ANALYZE FUNCTION?
INLINE FUNCTION evokes, for me, having a different version of a function that
gets substituted when it's inlined. ANALYZE FUNCTION seems reasonable, though.
I don't think any name we'd pick will make a significant number of readers
understand it without reading the full documentation.
nm
On Mon, May 23, 2011 at 03:01:40PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)We could go with your previous idea of not bothering to expose this in
the SQL syntax. Given that the helper function is going to have a
signature along the lines of "(internal, internal) -> internal", it's
going to be difficult for anyone to use it for non-builtin functions
anyhow.But if you really don't like that, what about
That would be just fine with me. We can always expose it later.
TRANSFORM helperfunctionname
Although TRANSFORM isn't currently a keyword for us, it is a
non-reserved keyword in SQL:2008, and it seems possible that we might
someday think about implementing the associated features.
I was thinking of that word too, along the lines of "PLAN TRANSFORM FUNCTION
helperfunctionname". Then again, that wrongly sounds somewhat like it's
transforming planner nodes. Perhaps plain TRANSFORM or TRANSFORM FUNCTION would
be the way to go.
nm
On May 24, 2011, at 12:15 AM, Noah Misch wrote:
On Mon, May 23, 2011 at 03:01:40PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)We could go with your previous idea of not bothering to expose this in
the SQL syntax. Given that the helper function is going to have a
signature along the lines of "(internal, internal) -> internal", it's
going to be difficult for anyone to use it for non-builtin functions
anyhow.But if you really don't like that, what about
That would be just fine with me. We can always expose it later.
TRANSFORM helperfunctionname
Although TRANSFORM isn't currently a keyword for us, it is a
non-reserved keyword in SQL:2008, and it seems possible that we might
someday think about implementing the associated features.I was thinking of that word too, along the lines of "PLAN TRANSFORM FUNCTION
helperfunctionname". Then again, that wrongly sounds somewhat like it's
transforming planner nodes. Perhaps plain TRANSFORM or TRANSFORM FUNCTION would
be the way to go.
Looks like this thread has silently died out. Is there an agreement on the
syntax and implementation part? We (CMD) have a customer, who is interested in
pushing this through, so, if we have a patch, I'd be happy to assist in
reviewing it.
--
Alexey Klyukin
The PostgreSQL Company - Command Prompt, Inc.
Hi Alexey,
On Thu, Jun 02, 2011 at 05:08:51PM +0300, Alexey Klyukin wrote:
Looks like this thread has silently died out. Is there an agreement on the
syntax and implementation part? We (CMD) have a customer, who is interested in
pushing this through, so, if we have a patch, I'd be happy to assist in
reviewing it.
I think we have a consensus on the implementation. We didn't totally lock down
the syntax. Tom and I seem happy to have no SQL exposure at all, so that's what
I'm planning to submit. However, we were pretty close to a syntax consensus in
the event that it becomes desirable to do otherwise.
Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?
Thanks for volunteering to review; that will be a big help. Actually, I could
especially use some feedback now on a related design and implementation:
http://archives.postgresql.org/message-id/20110524104029.GB18831@tornado.gateway.2wire.net
Note that the third and fifth sentences of that description are incorrect. The
rest stands without them. Even just some feedback on the mundane issue noted in
the last paragraph would help.
Thanks,
nm
Hi,
On Jun 2, 2011, at 10:22 PM, Noah Misch wrote:
Hi Alexey,
...
Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?
Exactly varchar conversions.
Thanks for volunteering to review; that will be a big help. Actually, I could
especially use some feedback now on a related design and implementation:
http://archives.postgresql.org/message-id/20110524104029.GB18831@tornado.gateway.2wire.net
Note that the third and fifth sentences of that description are incorrect. The
rest stands without them. Even just some feedback on the mundane issue noted in
the last paragraph would help.
Ok, I'll review it.
Thank you,
Alexey.
--
Command Prompt, Inc. http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Jun 3, 2011, at 10:11 AM, Alexey Klyukin wrote:
Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?Exactly varchar conversions.
Why limit it to varchar? Shouldn't we be able to do this for any varlena? The only challenge I see is numeric; we'd need to ensure that both size and precision are not decreasing.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> wrote:
The only challenge I see is numeric; we'd need to ensure that both
size and precision are not decreasing.
To be picky, wouldn't that need to be "neither (precision - scale)
nor scale is decreasing"?
-Kevin
On Fri, Jun 3, 2011 at 11:43 AM, Jim Nasby <jim@nasby.net> wrote:
On Jun 3, 2011, at 10:11 AM, Alexey Klyukin wrote:
Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?Exactly varchar conversions.
Why limit it to varchar? Shouldn't we be able to do this for any varlena? The only challenge I see is numeric; we'd need to ensure that both size and precision are not decreasing.
More than that: you should also be able to make it work for things
like xml -> text.
Indeed, I believe Noah has plans to do just that.
Which, quite frankly, will be awesome.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jun 03, 2011 at 10:43:17AM -0500, Jim Nasby wrote:
On Jun 3, 2011, at 10:11 AM, Alexey Klyukin wrote:
Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?Exactly varchar conversions.
Why limit it to varchar? Shouldn't we be able to do this for any varlena? The only challenge I see is numeric; we'd need to ensure that both size and precision are not decreasing.
I've implemented support for varchar, varbit, numeric, time, timetz, timestamp,
timestamptz, and interval. However, I'll probably submit only varchar in the
initial infrastructure patch and the rest in followup patches in a later CF.
For numeric, we store the display scale in every datum, so any change to it
rewrites the table. You'll be able to cheaply change numeric(7,2) to
numeric(9,2) but not to numeric(9,3).