Identifying no-op length coercions

Started by Noah Mischover 14 years ago34 messages
#1Noah Misch
noah@leadboat.com

I'd like to revive the discussion that arose during the last CommitFest over
the proper design for identifying no-op length coercions like varchar(4) ->
varchar(8). Here is the root of the original debate:

http://archives.postgresql.org/message-id/20110109220353.GD5777@tornado.leadboat.com

There were two proposals on the table:

1. Attach a "f(from_typmod, to_typmod, is_explicit) RETURNS boolean" function
to the pg_cast; call it in find_coercion_pathway()
2. Attach a "f(FuncExpr) RETURNS Expr" (actually internal/internal) function
to the pg_proc; call it in simplify_function()

I tried and failed to write a summary of the respective arguments that could
legitimately substitute for (re-)reading the original thread, so I haven't
included one. I myself find the advantages of #2 mildly more compelling.

As previously noted, if we enrich the typmod system, only #1 will require
interface changes. One of the suggestions that came up before was designing
the typmod enrichment before designing this. Unless someone has short-term
plans to prepare that design, I'd like to avoid serializing on it -- it may
never happen. One thought: if #1 proves preferable independent of this
concern, should we consider implementing it without documenting it or exposing
it at the SQL level? External type developers could still directly update
pg_cast, so long as they'll be happy to keep both pieces if we someday change
the required function signature.

Can we reach a design that nobody dislikes excessively?

Thanks,
nm

#2Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#1)
Re: Identifying no-op length coercions

On Mon, May 23, 2011 at 12:42 PM, Noah Misch <noah@leadboat.com> wrote:

I'd like to revive the discussion that arose during the last CommitFest over
the proper design for identifying no-op length coercions like varchar(4) ->
varchar(8).  Here is the root of the original debate:

 http://archives.postgresql.org/message-id/20110109220353.GD5777@tornado.leadboat.com

There were two proposals on the table:

1. Attach a "f(from_typmod, to_typmod, is_explicit) RETURNS boolean" function
  to the pg_cast; call it in find_coercion_pathway()
2. Attach a "f(FuncExpr) RETURNS Expr" (actually internal/internal) function
  to the pg_proc; call it in simplify_function()

I tried and failed to write a summary of the respective arguments that could
legitimately substitute for (re-)reading the original thread, so I haven't
included one.  I myself find the advantages of #2 mildly more compelling.

The main reason I preferred #1 is that it would only get invoked in
the case of casts, whereas #2 would get invoked for all function
calls. For us to pay that overhead, there has to be some use case,
and I didn't find the examples that were offered very compelling. How
many CPU cycles are we willing to spend trying to simplify x + 0 to
just x, or x * 0 to just 0? I might be missing something here, but
those strikes me as textbook examples of cases where it's not worth
slowing down the whole system to fix a query that the user could have
easily written in some less-pathological way to begin with.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: Identifying no-op length coercions

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, May 23, 2011 at 12:42 PM, Noah Misch <noah@leadboat.com> wrote:

There were two proposals on the table:

1. Attach a "f(from_typmod, to_typmod, is_explicit) RETURNS boolean" function
� to the pg_cast; call it in find_coercion_pathway()
2. Attach a "f(FuncExpr) RETURNS Expr" (actually internal/internal) function
� to the pg_proc; call it in simplify_function()

I tried and failed to write a summary of the respective arguments that could
legitimately substitute for (re-)reading the original thread, so I haven't
included one. �I myself find the advantages of #2 mildly more compelling.

The main reason I preferred #1 is that it would only get invoked in
the case of casts, whereas #2 would get invoked for all function
calls. For us to pay that overhead, there has to be some use case,
and I didn't find the examples that were offered very compelling.

Well, as I pointed out in
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02570.php
a hook function attached to pg_proc entries would cost nothing
measurable when not used. You could possibly make the same claim
for attaching the hook to pg_cast entries, if you cause the optimization
to occur during initial cast lookup rather than expression
simplification. But I remain of the opinion that that's the wrong place
to put it.

How
many CPU cycles are we willing to spend trying to simplify x + 0 to
just x, or x * 0 to just 0?

I'm not sure that's worthwhile either, but it was an example of a
possible future use-case rather than something that anybody was
proposing doing right now. Even though I tend to agree that it wouldn't
be worth looking for such cases with simple numeric datatypes, it's not
that hard to believe that someone might want the ability for complicated
datatypes with expensive operations.

In the short term, the only actual cost we'd incur is that we'd be
bloating pg_proc instead of pg_cast with an extra column, and there's
more rows in pg_proc. But pg_proc rows are already pretty darn wide.

regards, tom lane

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: Identifying no-op length coercions

On Mon, May 23, 2011 at 1:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, May 23, 2011 at 12:42 PM, Noah Misch <noah@leadboat.com> wrote:

There were two proposals on the table:

1. Attach a "f(from_typmod, to_typmod, is_explicit) RETURNS boolean" function
  to the pg_cast; call it in find_coercion_pathway()
2. Attach a "f(FuncExpr) RETURNS Expr" (actually internal/internal) function
  to the pg_proc; call it in simplify_function()

I tried and failed to write a summary of the respective arguments that could
legitimately substitute for (re-)reading the original thread, so I haven't
included one.  I myself find the advantages of #2 mildly more compelling.

The main reason I preferred #1 is that it would only get invoked in
the case of casts, whereas #2 would get invoked for all function
calls.  For us to pay that overhead, there has to be some use case,
and I didn't find the examples that were offered very compelling.

Well, as I pointed out in
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02570.php
a hook function attached to pg_proc entries would cost nothing
measurable when not used.  You could possibly make the same claim
for attaching the hook to pg_cast entries, if you cause the optimization
to occur during initial cast lookup rather than expression
simplification.  But I remain of the opinion that that's the wrong place
to put it.

So you said here:

http://archives.postgresql.org/pgsql-hackers/2011-01/msg02575.php
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02585.php

The trouble is, I still can't see why type OIDs and typemods should be
handled differently. Taking your example again:

CREATE TABLE base (f1 varchar(4));
CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);

Your claim on the thread is that we want to someday allow this case.
But what if the last statement were instead:

ALTER TABLE base ALTER COLUMN f1 TYPE integer;

Should it also be our goal to handle that case? If not, why are they different?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: Identifying no-op length coercions

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, May 23, 2011 at 1:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

... But I remain of the opinion that that's the wrong place
to put it.

So you said here:

http://archives.postgresql.org/pgsql-hackers/2011-01/msg02575.php
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02585.php

The trouble is, I still can't see why type OIDs and typemods should be
handled differently. Taking your example again:

CREATE TABLE base (f1 varchar(4));
CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);

Your claim on the thread is that we want to someday allow this case.
But what if the last statement were instead:

ALTER TABLE base ALTER COLUMN f1 TYPE integer;

Should it also be our goal to handle that case?

Maybe. But casts would be the least of our concerns if we were trying
to change the column type. Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does. In any case, the fact that the current implementation can't
readily support that is a poor excuse for building entirely new features
that also can't support it, when said features could easily be designed
without the restriction.

But more generally, I don't believe that you've made any positive case
whatever for doing it from pg_cast. It's not faster, it's not more
flexible, so why should we choose that approach?

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: Identifying no-op length coercions

On Mon, May 23, 2011 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe.  But casts would be the least of our concerns if we were trying
to change the column type.  Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does.

OK, this is the crucial point I was missing. Sorry for being a bit
fuzzy-headed about this.

My mental model of our type system, or of what a type system ought to
do, just doesn't match the type system we've got.

So let's do it the way you proposed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#6)
Re: Identifying no-op length coercions

On Mon, May 23, 2011 at 02:11:39PM -0400, Robert Haas wrote:

On Mon, May 23, 2011 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe. ?But casts would be the least of our concerns if we were trying
to change the column type. ?Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does.

OK, this is the crucial point I was missing. Sorry for being a bit
fuzzy-headed about this.

My mental model of our type system, or of what a type system ought to
do, just doesn't match the type system we've got.

So let's do it the way you proposed.

Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:

CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)

Both feel forced, to put it generously. Any better ideas? Worth adding a
keyword to get something decent?

Thanks,
nm

#8Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#7)
Re: Identifying no-op length coercions

On Mon, May 23, 2011 at 2:46 PM, Noah Misch <noah@leadboat.com> wrote:

On Mon, May 23, 2011 at 02:11:39PM -0400, Robert Haas wrote:

On Mon, May 23, 2011 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe. ?But casts would be the least of our concerns if we were trying
to change the column type. ?Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does.

OK, this is the crucial point I was missing.  Sorry for being a bit
fuzzy-headed about this.

My mental model of our type system, or of what a type system ought to
do, just doesn't match the type system we've got.

So let's do it the way you proposed.

Good deal.  Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax.  Presumably, it will be
a new common_func_opt_item.  When I last looked at the keywords list and tried
to come up with something, these were the best I could do:

 CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
 CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)

Both feel forced, to put it generously.  Any better ideas?  Worth adding a
keyword to get something decent?

Do you have something specific in mind?

Just to throw out another few possibilities, how about INLINE FUNCTION
or ANALYZE FUNCTION?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#7)
Re: Identifying no-op length coercions

Noah Misch <noah@leadboat.com> writes:

Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:

CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)

We could go with your previous idea of not bothering to expose this in
the SQL syntax. Given that the helper function is going to have a
signature along the lines of "(internal, internal) -> internal", it's
going to be difficult for anyone to use it for non-builtin functions
anyhow.

But if you really don't like that, what about

TRANSFORM helperfunctionname

Although TRANSFORM isn't currently a keyword for us, it is a
non-reserved keyword in SQL:2008, and it seems possible that we might
someday think about implementing the associated features.

regards, tom lane

#10Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#9)
Re: Identifying no-op length coercions

On Mon, May 23, 2011 at 12:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

 Given that the helper function is going to have a
signature along the lines of "(internal, internal) -> internal", it's
going to be difficult for anyone to use it for non-builtin functions
anyhow.

I hate to go around in circles on this but I didn't see the original discussion.

This was the thing that concerned me. If anyone wants to add this
feature for a new data type they're going to have to understand and
tie their code to all this internal parser node stuff. That means
their code will be much more closely tied to a specific version, will
have to be written in C, and will require much more in-depth
understanding of Postgres internal data structures.

By comparison the boolean cast predicate could be written in any
language and only required the data type implementor to understand
their data type. It seems much more likely to actually get used and be
used correctly.

--
greg

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#10)
Re: Identifying no-op length coercions

Greg Stark <gsstark@mit.edu> writes:

This was the thing that concerned me. If anyone wants to add this
feature for a new data type they're going to have to understand and
tie their code to all this internal parser node stuff. That means
their code will be much more closely tied to a specific version, will
have to be written in C, and will require much more in-depth
understanding of Postgres internal data structures.

By comparison the boolean cast predicate could be written in any
language and only required the data type implementor to understand
their data type. It seems much more likely to actually get used and be
used correctly.

I don't think I believe that interesting length-conversion functions are
going to be written in anything but C anyway, so I don't find that that
argument holds much water. Generally, the low-level functions for a new
datatype have to be in C.

As for the other point, if all you want to do is examine the
expression's typmod, the API for that is pretty stable (exprTypmod()).

regards, tom lane

#12Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#8)
Re: Identifying no-op length coercions

On Mon, May 23, 2011 at 02:53:01PM -0400, Robert Haas wrote:

On Mon, May 23, 2011 at 2:46 PM, Noah Misch <noah@leadboat.com> wrote:

On Mon, May 23, 2011 at 02:11:39PM -0400, Robert Haas wrote:

On Mon, May 23, 2011 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe. ?But casts would be the least of our concerns if we were trying
to change the column type. ?Changing typmod doesn't affect the set of
operations that could be applied to a column, whereas changing type
surely does.

OK, this is the crucial point I was missing. ?Sorry for being a bit
fuzzy-headed about this.

My mental model of our type system, or of what a type system ought to
do, just doesn't match the type system we've got.

So let's do it the way you proposed.

Good deal. ?Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. ?Presumably, it will be
a new common_func_opt_item. ?When I last looked at the keywords list and tried
to come up with something, these were the best I could do:

?CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
?CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)

Both feel forced, to put it generously. ?Any better ideas? ?Worth adding a
keyword to get something decent?

Do you have something specific in mind?

I had not. Having thought about it some, maybe "PLAN TRANSFORM FUNCTION". Do
we have a name for the pass over the tree rooted at eval_const_expressions()?

Just to throw out another few possibilities, how about INLINE FUNCTION
or ANALYZE FUNCTION?

INLINE FUNCTION evokes, for me, having a different version of a function that
gets substituted when it's inlined. ANALYZE FUNCTION seems reasonable, though.
I don't think any name we'd pick will make a significant number of readers
understand it without reading the full documentation.

nm

#13Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#9)
Re: Identifying no-op length coercions

On Mon, May 23, 2011 at 03:01:40PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:

CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)

We could go with your previous idea of not bothering to expose this in
the SQL syntax. Given that the helper function is going to have a
signature along the lines of "(internal, internal) -> internal", it's
going to be difficult for anyone to use it for non-builtin functions
anyhow.

But if you really don't like that, what about

That would be just fine with me. We can always expose it later.

TRANSFORM helperfunctionname

Although TRANSFORM isn't currently a keyword for us, it is a
non-reserved keyword in SQL:2008, and it seems possible that we might
someday think about implementing the associated features.

I was thinking of that word too, along the lines of "PLAN TRANSFORM FUNCTION
helperfunctionname". Then again, that wrongly sounds somewhat like it's
transforming planner nodes. Perhaps plain TRANSFORM or TRANSFORM FUNCTION would
be the way to go.

nm

#14Alexey Klyukin
alexk@commandprompt.com
In reply to: Noah Misch (#13)
Re: Identifying no-op length coercions

On May 24, 2011, at 12:15 AM, Noah Misch wrote:

On Mon, May 23, 2011 at 03:01:40PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

Good deal. Given that conclusion, the other policy decision I anticipate
affecting this particular patch is the choice of syntax. Presumably, it will be
a new common_func_opt_item. When I last looked at the keywords list and tried
to come up with something, these were the best I could do:

CREATE FUNCTION ... PARSER MAPPING helperfunc(args)
CREATE FUNCTION ... PLANS CONVERSION helperfunc(args)

We could go with your previous idea of not bothering to expose this in
the SQL syntax. Given that the helper function is going to have a
signature along the lines of "(internal, internal) -> internal", it's
going to be difficult for anyone to use it for non-builtin functions
anyhow.

But if you really don't like that, what about

That would be just fine with me. We can always expose it later.

TRANSFORM helperfunctionname

Although TRANSFORM isn't currently a keyword for us, it is a
non-reserved keyword in SQL:2008, and it seems possible that we might
someday think about implementing the associated features.

I was thinking of that word too, along the lines of "PLAN TRANSFORM FUNCTION
helperfunctionname". Then again, that wrongly sounds somewhat like it's
transforming planner nodes. Perhaps plain TRANSFORM or TRANSFORM FUNCTION would
be the way to go.

Looks like this thread has silently died out. Is there an agreement on the
syntax and implementation part? We (CMD) have a customer, who is interested in
pushing this through, so, if we have a patch, I'd be happy to assist in
reviewing it.

--
Alexey Klyukin
The PostgreSQL Company - Command Prompt, Inc.

#15Noah Misch
noah@leadboat.com
In reply to: Alexey Klyukin (#14)
Re: Identifying no-op length coercions

Hi Alexey,

On Thu, Jun 02, 2011 at 05:08:51PM +0300, Alexey Klyukin wrote:

Looks like this thread has silently died out. Is there an agreement on the
syntax and implementation part? We (CMD) have a customer, who is interested in
pushing this through, so, if we have a patch, I'd be happy to assist in
reviewing it.

I think we have a consensus on the implementation. We didn't totally lock down
the syntax. Tom and I seem happy to have no SQL exposure at all, so that's what
I'm planning to submit. However, we were pretty close to a syntax consensus in
the event that it becomes desirable to do otherwise.

Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?

Thanks for volunteering to review; that will be a big help. Actually, I could
especially use some feedback now on a related design and implementation:
http://archives.postgresql.org/message-id/20110524104029.GB18831@tornado.gateway.2wire.net
Note that the third and fifth sentences of that description are incorrect. The
rest stands without them. Even just some feedback on the mundane issue noted in
the last paragraph would help.

Thanks,
nm

#16Alexey Klyukin
alexk@commandprompt.com
In reply to: Noah Misch (#15)
Re: Identifying no-op length coercions

Hi,

On Jun 2, 2011, at 10:22 PM, Noah Misch wrote:

Hi Alexey,

...

Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?

Exactly varchar conversions.

Thanks for volunteering to review; that will be a big help. Actually, I could
especially use some feedback now on a related design and implementation:
http://archives.postgresql.org/message-id/20110524104029.GB18831@tornado.gateway.2wire.net
Note that the third and fifth sentences of that description are incorrect. The
rest stands without them. Even just some feedback on the mundane issue noted in
the last paragraph would help.

Ok, I'll review it.

Thank you,
Alexey.

--
Command Prompt, Inc. http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Jim Nasby
jim@nasby.net
In reply to: Alexey Klyukin (#16)
Re: Identifying no-op length coercions

On Jun 3, 2011, at 10:11 AM, Alexey Klyukin wrote:

Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?

Exactly varchar conversions.

Why limit it to varchar? Shouldn't we be able to do this for any varlena? The only challenge I see is numeric; we'd need to ensure that both size and precision are not decreasing.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#18Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jim Nasby (#17)
Re: Identifying no-op length coercions

Jim Nasby <jim@nasby.net> wrote:

The only challenge I see is numeric; we'd need to ensure that both
size and precision are not decreasing.

To be picky, wouldn't that need to be "neither (precision - scale)
nor scale is decreasing"?

-Kevin

#19Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#17)
Re: Identifying no-op length coercions

On Fri, Jun 3, 2011 at 11:43 AM, Jim Nasby <jim@nasby.net> wrote:

On Jun 3, 2011, at 10:11 AM, Alexey Klyukin wrote:

Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?

Exactly varchar conversions.

Why limit it to varchar? Shouldn't we be able to do this for any varlena? The only challenge I see is numeric; we'd need to ensure that both size and precision are not decreasing.

More than that: you should also be able to make it work for things
like xml -> text.

Indeed, I believe Noah has plans to do just that.

Which, quite frankly, will be awesome.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Noah Misch
noah@leadboat.com
In reply to: Jim Nasby (#17)
Re: Identifying no-op length coercions

On Fri, Jun 03, 2011 at 10:43:17AM -0500, Jim Nasby wrote:

On Jun 3, 2011, at 10:11 AM, Alexey Klyukin wrote:

Is your interest in cheap varchar(N)->varchar(N+M) conversions specifically, or
in some broader application of this facility?

Exactly varchar conversions.

Why limit it to varchar? Shouldn't we be able to do this for any varlena? The only challenge I see is numeric; we'd need to ensure that both size and precision are not decreasing.

I've implemented support for varchar, varbit, numeric, time, timetz, timestamp,
timestamptz, and interval. However, I'll probably submit only varchar in the
initial infrastructure patch and the rest in followup patches in a later CF.

For numeric, we store the display scale in every datum, so any change to it
rewrites the table. You'll be able to cheaply change numeric(7,2) to
numeric(9,2) but not to numeric(9,3).

#21Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#15)
1 attachment(s)
Re: Identifying no-op length coercions

On Thu, Jun 02, 2011 at 03:22:49PM -0400, Noah Misch wrote:

On Thu, Jun 02, 2011 at 05:08:51PM +0300, Alexey Klyukin wrote:

Looks like this thread has silently died out. Is there an agreement on the
syntax and implementation part? We (CMD) have a customer, who is interested in
pushing this through, so, if we have a patch, I'd be happy to assist in
reviewing it.

I think we have a consensus on the implementation. We didn't totally lock down
the syntax. Tom and I seem happy to have no SQL exposure at all, so that's what
I'm planning to submit. However, we were pretty close to a syntax consensus in
the event that it becomes desirable to do otherwise.

Here's the patch. It contains the core support for transform functions and
applies that support to the varchar() length coercion.

We originally discussed having the transform function take and return Expr
nodes. It turns out that simplify_function() does not have the Expr, probably
because the particular choice of node type among the many that can convey a
function call does not matter to it. The same should be true of a transform
function -- it should do the same thing whether the subject call arrives from
a FuncExpr or an OpExpr. Therefore, I changed the transform function
signature to "Expr *protransform(List *args)".

Previously, ALTER TABLE did not plan its expressions until ATRewriteTable.
Since we need to make a decision on whether to rewrite based on examination of
a fully simplified expression, I moved the planning earlier. This means
planning now happens before the catalog updates associated with the ALTER
TABLE. This is at least as safe as what we did before, because the heap
available to expressions during ATRewriteTable is the pre-rewrite heap. I
wouldn't be surprised if there was and is some way to get into trouble with a
USING expression that refers back to the current table indirectly, but I did
not explore that in any detail.

The large pg_proc.h diff mostly just adds protransform = 0 to every function.
Due to the resulting patch size, I've compressed it. There are new/otherwise
changed DATA lines for OIDs 669 and 3097 only.

Any transform function for a length coercion cast will need code to strip
subsidiary RelabelType nodes and apply a new one that changes only the typmod.
I've gone ahead and split that out into a function in parse_clause.c.

I verified the patch's behavior using pg_regress tests I developed previously.
The tests rely on already-rejected instrumentation, so I don't include them in
the patch. If anyone would like an updated copy of the tests patch to aid
your review, let me know.

This will obviously require a catversion bump.

I looked for a README file that might be suitable for a brief note on how to
use the feature. src/backend/optimizer/README seemed closest, but it's all
much higher-level. Perhaps, like most optimization microfeatures, this
requires no particular discussion outside the code implementing it. I would
include approximately this README text if anyone feels it's useful:

pg_proc.protransform functions
------------------------------

Some functions calls can be simplified at plan time based on properties
specific to the function. For example, "int4mul(n, 1)" simplifies to "n", and
"varchar(s::varchar(4), 8, true)" simplifies to "s::varchar(4)". To define
such function-specific optimizations, write a "transform function" and store
its OID in the pg_proc.protransform of the primary function. This transform
function has the signature "protransform(internal) RETURNS internal", mapping
internally as "Expr *protransform(List *args)". If the transform function's
study of the argument Nodes proves that a simplified Expr substitutes for
every call represented by such arguments, return that Expr. Otherwise, return
the NULL pointer. We make no guarantee that PostgreSQL will never call the
main function in cases that the transform function would simplify. Ensure
rigorous equivalence between the simplified expression and an actual call to
the main function.

Thanks,
nm

Attachments:

noop-length-coercion-v1.patch.gzapplication/x-gunzipDownload
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#21)
Re: Identifying no-op length coercions

Noah Misch <noah@leadboat.com> writes:

We originally discussed having the transform function take and return Expr
nodes. It turns out that simplify_function() does not have the Expr, probably
because the particular choice of node type among the many that can convey a
function call does not matter to it. The same should be true of a transform
function -- it should do the same thing whether the subject call arrives from
a FuncExpr or an OpExpr. Therefore, I changed the transform function
signature to "Expr *protransform(List *args)".

That seems like the wrong thing. For one thing, it makes it quite
impossible to share one transform support function among multiple
functions/operators, since it won't know which one the argument list
is for. More generally, I foresee the transform needing to know
most of the other information that might be in the FuncExpr or OpExpr.
It certainly would need access to the collation, and it would probably
like to be able to copy the parse location to whatever it outputs,
and so forth and so on. So I really think passing the node to the
function is the most future-proof way to do it. If that means you
need to rejigger things a bit in clauses.c, so be it.

The large pg_proc.h diff mostly just adds protransform = 0 to every function.
Due to the resulting patch size, I've compressed it. There are new/otherwise
changed DATA lines for OIDs 669 and 3097 only.

The chances that this part will still apply cleanly by the time anyone
gets around to committing it are nil. I'd suggest you break it into two
separate patches, one that modifies the existing lines to add the
protransform = 0 column and then one to apply the remaining deltas in
the file. I envision the eventual committer doing the first step
on-the-fly (perhaps with an emacs macro, at least that's the way I
usually do it) and then wanting a patchable diff for the rest. Or if
you wanted to be really helpful, you could provide a throwaway perl
script to do the modifications of the existing lines ...

I haven't read the actual patch, these are just some quick reactions
to your description.

regards, tom lane

#23Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#22)
Re: Identifying no-op length coercions

On Sat, Jun 11, 2011 at 02:11:55PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

We originally discussed having the transform function take and return Expr
nodes. It turns out that simplify_function() does not have the Expr, probably
because the particular choice of node type among the many that can convey a
function call does not matter to it. The same should be true of a transform
function -- it should do the same thing whether the subject call arrives from
a FuncExpr or an OpExpr. Therefore, I changed the transform function
signature to "Expr *protransform(List *args)".

That seems like the wrong thing. For one thing, it makes it quite
impossible to share one transform support function among multiple
functions/operators, since it won't know which one the argument list
is for. More generally, I foresee the transform needing to know
most of the other information that might be in the FuncExpr or OpExpr.
It certainly would need access to the collation, and it would probably
like to be able to copy the parse location to whatever it outputs,
and so forth and so on. So I really think passing the node to the
function is the most future-proof way to do it. If that means you
need to rejigger things a bit in clauses.c, so be it.

Good points. I'm thinking, then, add an Expr argument to simplify_function()
and have the CoerceViaIO branch of eval_const_expressions_mutator() pass NULL
for both its simplify_function() calls. If simplify_function() gets a NULL Expr
for a function that has a protransform, synthesize a FuncExpr based on its other
arguments before calling the transform function. Seem reasonable? Granted,
that would be dead code until someone applies a transform function to a type I/O
function, which could easily never happen. Perhaps just ignore the transform
function when we started with a CoerceViaIO node?

The large pg_proc.h diff mostly just adds protransform = 0 to every function.
Due to the resulting patch size, I've compressed it. There are new/otherwise
changed DATA lines for OIDs 669 and 3097 only.

The chances that this part will still apply cleanly by the time anyone
gets around to committing it are nil. I'd suggest you break it into two
separate patches, one that modifies the existing lines to add the
protransform = 0 column and then one to apply the remaining deltas in
the file. I envision the eventual committer doing the first step
on-the-fly (perhaps with an emacs macro, at least that's the way I
usually do it) and then wanting a patchable diff for the rest. Or if
you wanted to be really helpful, you could provide a throwaway perl
script to do the modifications of the existing lines ...

That would be better; I'll do it for the next version.

I haven't read the actual patch, these are just some quick reactions
to your description.

Thanks,
nm

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#23)
Re: Identifying no-op length coercions

Noah Misch <noah@leadboat.com> writes:

Good points. I'm thinking, then, add an Expr argument to simplify_function()
and have the CoerceViaIO branch of eval_const_expressions_mutator() pass NULL
for both its simplify_function() calls. If simplify_function() gets a NULL Expr
for a function that has a protransform, synthesize a FuncExpr based on its other
arguments before calling the transform function. Seem reasonable? Granted,
that would be dead code until someone applies a transform function to a type I/O
function, which could easily never happen. Perhaps just ignore the transform
function when we started with a CoerceViaIO node?

Until we actually have a use-case for simplifying I/O functions like this,
I can't see going out of our way for it ...

regards, tom lane

#25Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#24)
1 attachment(s)
Re: Identifying no-op length coercions

On Sat, Jun 11, 2011 at 03:03:18PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

Good points. I'm thinking, then, add an Expr argument to simplify_function()
and have the CoerceViaIO branch of eval_const_expressions_mutator() pass NULL
for both its simplify_function() calls. If simplify_function() gets a NULL Expr
for a function that has a protransform, synthesize a FuncExpr based on its other
arguments before calling the transform function. Seem reasonable? Granted,
that would be dead code until someone applies a transform function to a type I/O
function, which could easily never happen. Perhaps just ignore the transform
function when we started with a CoerceViaIO node?

Until we actually have a use-case for simplifying I/O functions like this,
I can't see going out of our way for it ...

Sounds good. Updated patch attached, incorporating your ideas. Before applying
it, run this command to update the uninvolved pg_proc.h DATA entries:
perl -pi -e 's/PGUID(\s+\d+){4}/$& 0/' src/include/catalog/pg_proc.h

Thanks,
nm

Attachments:

noop-length-coercion-v2.patchtext/plain; charset=us-asciiDownload
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 8504555..5d1a447 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 4337,4342 ****
--- 4337,4349 ----
       </row>
  
       <row>
+       <entry><structfield>protransform</structfield></entry>
+       <entry><type>regproc</type></entry>
+       <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+       <entry>Calls to function can be simplified by this other function</entry>
+      </row>
+ 
+      <row>
        <entry><structfield>proisagg</structfield></entry>
        <entry><type>bool</type></entry>
        <entry></entry>
diff --git a/src/backend/catalog/index 6250b07..92be0a7 100644
*** a/src/backend/catalog/pg_proc.c
--- b/src/backend/catalog/pg_proc.c
***************
*** 304,309 **** ProcedureCreate(const char *procedureName,
--- 304,310 ----
  	values[Anum_pg_proc_procost - 1] = Float4GetDatum(procost);
  	values[Anum_pg_proc_prorows - 1] = Float4GetDatum(prorows);
  	values[Anum_pg_proc_provariadic - 1] = ObjectIdGetDatum(variadicType);
+ 	values[Anum_pg_proc_protransform - 1] = ObjectIdGetDatum(InvalidOid);
  	values[Anum_pg_proc_proisagg - 1] = BoolGetDatum(isAgg);
  	values[Anum_pg_proc_proiswindow - 1] = BoolGetDatum(isWindowFunc);
  	values[Anum_pg_proc_prosecdef - 1] = BoolGetDatum(security_definer);
diff --git a/src/backend/commands/taindex 2c9f855..563a1b2 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 56,61 ****
--- 56,62 ----
  #include "nodes/nodeFuncs.h"
  #include "nodes/parsenodes.h"
  #include "optimizer/clauses.h"
+ #include "optimizer/planner.h"
  #include "parser/parse_clause.h"
  #include "parser/parse_coerce.h"
  #include "parser/parse_collate.h"
***************
*** 3495,3501 **** ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
  	{
  		NewColumnValue *ex = lfirst(l);
  
! 		ex->exprstate = ExecPrepareExpr((Expr *) ex->expr, estate);
  	}
  
  	notnull_attrs = NIL;
--- 3496,3503 ----
  	{
  		NewColumnValue *ex = lfirst(l);
  
! 		/* expr already planned */
! 		ex->exprstate = ExecInitExpr((Expr *) ex->expr, NULL);
  	}
  
  	notnull_attrs = NIL;
***************
*** 4398,4404 **** ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
  
  			newval = (NewColumnValue *) palloc0(sizeof(NewColumnValue));
  			newval->attnum = attribute.attnum;
! 			newval->expr = defval;
  
  			tab->newvals = lappend(tab->newvals, newval);
  			tab->rewrite = true;
--- 4400,4406 ----
  
  			newval = (NewColumnValue *) palloc0(sizeof(NewColumnValue));
  			newval->attnum = attribute.attnum;
! 			newval->expr = expression_planner(defval);
  
  			tab->newvals = lappend(tab->newvals, newval);
  			tab->rewrite = true;
***************
*** 6706,6711 **** ATPrepAlterColumnType(List **wqueue,
--- 6708,6716 ----
  		/* Fix collations after all else */
  		assign_expr_collations(pstate, transform);
  
+ 		/* Plan the expr now so we can accurately assess the need to rewrite. */
+ 		transform = (Node *) expression_planner((Expr *) transform);
+ 
  		/*
  		 * Add a work queue item to make ATRewriteTable update the column
  		 * contents.
diff --git a/src/backend/optimizer/utilindex 2914c39..e9fb750 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
***************
*** 106,114 **** static List *simplify_and_arguments(List *args,
  					   eval_const_expressions_context *context,
  					   bool *haveNull, bool *forceFalse);
  static Node *simplify_boolean_equality(Oid opno, List *args);
! static Expr *simplify_function(Oid funcid,
! 				  Oid result_type, int32 result_typmod,
! 				  Oid result_collid, Oid input_collid, List **args,
  				  bool has_named_args,
  				  bool allow_inline,
  				  eval_const_expressions_context *context);
--- 106,114 ----
  					   eval_const_expressions_context *context,
  					   bool *haveNull, bool *forceFalse);
  static Node *simplify_boolean_equality(Oid opno, List *args);
! static Expr *simplify_function(Expr *oldexpr, Oid funcid,
! 				  Oid result_type, int32 result_typmod, Oid result_collid,
! 				  Oid input_collid, List **args,
  				  bool has_named_args,
  				  bool allow_inline,
  				  eval_const_expressions_context *context);
***************
*** 2223,2229 **** eval_const_expressions_mutator(Node *node,
  		 * FuncExpr, but not when the node is recognizably a length coercion;
  		 * we want to preserve the typmod in the eventual Const if so.
  		 */
! 		simple = simplify_function(expr->funcid,
  								   expr->funcresulttype, exprTypmod(node),
  								   expr->funccollid,
  								   expr->inputcollid,
--- 2223,2230 ----
  		 * FuncExpr, but not when the node is recognizably a length coercion;
  		 * we want to preserve the typmod in the eventual Const if so.
  		 */
! 		simple = simplify_function((Expr *) expr,
! 								   expr->funcid,
  								   expr->funcresulttype, exprTypmod(node),
  								   expr->funccollid,
  								   expr->inputcollid,
***************
*** 2275,2281 **** eval_const_expressions_mutator(Node *node,
  		 * Code for op/func reduction is pretty bulky, so split it out as a
  		 * separate function.
  		 */
! 		simple = simplify_function(expr->opfuncid,
  								   expr->opresulttype, -1,
  								   expr->opcollid,
  								   expr->inputcollid,
--- 2276,2283 ----
  		 * Code for op/func reduction is pretty bulky, so split it out as a
  		 * separate function.
  		 */
! 		simple = simplify_function((Expr *) expr,
! 								   expr->opfuncid,
  								   expr->opresulttype, -1,
  								   expr->opcollid,
  								   expr->inputcollid,
***************
*** 2372,2378 **** eval_const_expressions_mutator(Node *node,
  			 * Code for op/func reduction is pretty bulky, so split it out as
  			 * a separate function.
  			 */
! 			simple = simplify_function(expr->opfuncid,
  									   expr->opresulttype, -1,
  									   expr->opcollid,
  									   expr->inputcollid,
--- 2374,2381 ----
  			 * Code for op/func reduction is pretty bulky, so split it out as
  			 * a separate function.
  			 */
! 			simple = simplify_function((Expr *) expr,
! 									   expr->opfuncid,
  									   expr->opresulttype, -1,
  									   expr->opcollid,
  									   expr->inputcollid,
***************
*** 2561,2567 **** eval_const_expressions_mutator(Node *node,
  		getTypeOutputInfo(exprType((Node *) arg), &outfunc, &outtypisvarlena);
  		getTypeInputInfo(expr->resulttype, &infunc, &intypioparam);
  
! 		simple = simplify_function(outfunc,
  								   CSTRINGOID, -1,
  								   InvalidOid,
  								   InvalidOid,
--- 2564,2571 ----
  		getTypeOutputInfo(exprType((Node *) arg), &outfunc, &outtypisvarlena);
  		getTypeInputInfo(expr->resulttype, &infunc, &intypioparam);
  
! 		simple = simplify_function(NULL,
! 								   outfunc,
  								   CSTRINGOID, -1,
  								   InvalidOid,
  								   InvalidOid,
***************
*** 2581,2587 **** eval_const_expressions_mutator(Node *node,
  									  Int32GetDatum(-1),
  									  false, true));
  
! 			simple = simplify_function(infunc,
  									   expr->resulttype, -1,
  									   expr->resultcollid,
  									   InvalidOid,
--- 2585,2592 ----
  									  Int32GetDatum(-1),
  									  false, true));
  
! 			simple = simplify_function(NULL,
! 									   infunc,
  									   expr->resulttype, -1,
  									   expr->resultcollid,
  									   InvalidOid,
***************
*** 3417,3427 **** simplify_boolean_equality(Oid opno, List *args)
   * Subroutine for eval_const_expressions: try to simplify a function call
   * (which might originally have been an operator; we don't care)
   *
!  * Inputs are the function OID, actual result type OID (which is needed for
!  * polymorphic functions), result typmod, result collation,
!  * the input collation to use for the function,
!  * the pre-simplified argument list, and some flags;
!  * also the context data for eval_const_expressions.
   *
   * Returns a simplified expression if successful, or NULL if cannot
   * simplify the function call.
--- 3422,3436 ----
   * Subroutine for eval_const_expressions: try to simplify a function call
   * (which might originally have been an operator; we don't care)
   *
!  * Inputs are the original expression (can be NULL), function OID, actual
!  * result type OID (which is needed for polymorphic functions), result typmod,
!  * result collation, the input collation to use for the function, the
!  * pre-simplified argument list, and some flags; also the context data for
!  * eval_const_expressions.  In common cases, several of the arguments could be
!  * derived from the original expression.  Sending them separately avoids
!  * duplicating NodeTag-specific knowledge, and it's necessary for CoerceViaIO.
!  * A NULL original expression disables use of transform functions while
!  * retaining all other behaviors.
   *
   * Returns a simplified expression if successful, or NULL if cannot
   * simplify the function call.
***************
*** 3433,3446 **** simplify_boolean_equality(Oid opno, List *args)
   * pass-by-reference, and it may get modified even if simplification fails.
   */
  static Expr *
! simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
! 				  Oid result_collid, Oid input_collid, List **args,
  				  bool has_named_args,
  				  bool allow_inline,
  				  eval_const_expressions_context *context)
  {
  	HeapTuple	func_tuple;
  	Expr	   *newexpr;
  
  	/*
  	 * We have two strategies for simplification: either execute the function
--- 3442,3457 ----
   * pass-by-reference, and it may get modified even if simplification fails.
   */
  static Expr *
! simplify_function(Expr *oldexpr, Oid funcid,
! 				  Oid result_type, int32 result_typmod, Oid result_collid,
! 				  Oid input_collid, List **args,
  				  bool has_named_args,
  				  bool allow_inline,
  				  eval_const_expressions_context *context)
  {
  	HeapTuple	func_tuple;
  	Expr	   *newexpr;
+ 	Oid			transform;
  
  	/*
  	 * We have two strategies for simplification: either execute the function
***************
*** 3468,3473 **** simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
--- 3479,3489 ----
  								result_collid, input_collid, *args,
  								func_tuple, context);
  
+ 	transform = ((Form_pg_proc) GETSTRUCT(func_tuple))->protransform;
+ 	if (!newexpr && OidIsValid(transform) && oldexpr)
+ 		newexpr = (Expr *) DatumGetPointer(OidFunctionCall1(transform,
+ 															PointerGetDatum(oldexpr)));
+ 
  	if (!newexpr && allow_inline)
  		newexpr = inline_function(funcid, result_type, result_collid,
  								  input_collid, *args,
diff --git a/src/backend/parser/parse_clausindex c5fe6b6..a8549e0 100644
*** a/src/backend/parser/parse_clause.c
--- b/src/backend/parser/parse_clause.c
***************
*** 2278,2280 **** transformFrameOffset(ParseState *pstate, int frameOptions, Node *clause)
--- 2278,2302 ----
  
  	return node;
  }
+ 
+ /*
+  * relabel_to_typmod
+  *		Add a RelabelType node that changes just the typmod, and remove all
+  *		now-superfluous RelabelType nodes beneath it.
+  */
+ Node *
+ relabel_to_typmod(Node *expr, int32 typmod)
+ {
+ 	Oid			type = exprType(expr);
+ 	Oid			coll = exprCollation(expr);
+ 
+ 	/*
+ 	 * Strip any existing RelabelType, then add one. This is to preserve the
+ 	 * invariant of no redundant RelabelTypes.
+ 	 */
+ 	while (IsA(expr, RelabelType))
+ 		expr = (Node *) ((RelabelType *) expr)->arg;
+ 
+ 	return (Node *) makeRelabelType((Expr *) expr, type, typmod, coll,
+ 									COERCE_DONTCARE);
+ }
diff --git a/src/backend/utils/adt/varchindex 1c0ef92..efa483d 100644
*** a/src/backend/utils/adt/varchar.c
--- b/src/backend/utils/adt/varchar.c
***************
*** 18,23 ****
--- 18,25 ----
  #include "access/hash.h"
  #include "access/tuptoaster.h"
  #include "libpq/pqformat.h"
+ #include "nodes/nodeFuncs.h"
+ #include "parser/parse_clause.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "mb/pg_wchar.h"
***************
*** 549,554 **** varcharsend(PG_FUNCTION_ARGS)
--- 551,588 ----
  
  
  /*
+  * Flatten calls to our length coercion function that leave the new maximum
+  * length >= the previous maximum length.  We ignore the isExplicit argument,
+  * which only affects truncation.
+  */
+ Datum
+ varchar_transform(PG_FUNCTION_ARGS)
+ {
+ 	FuncExpr   *expr = (FuncExpr *) PG_GETARG_POINTER(0);
+ 	Node	   *typmod;
+ 	Node	   *ret = NULL;
+ 
+ 	if (!IsA(expr, FuncExpr))
+ 		PG_RETURN_POINTER(ret);
+ 
+ 	Assert(list_length(expr->args) == 3);
+ 	typmod = lsecond(expr->args);
+ 
+ 	if (IsA(typmod, Const))
+ 	{
+ 		Node	   *source = linitial(expr->args);
+ 		int32		new_typmod = DatumGetInt32(((Const *) typmod)->constvalue);
+ 		int32		old_max = exprTypmod(source) - VARHDRSZ;
+ 		int32		new_max = new_typmod - VARHDRSZ;
+ 
+ 		if (new_max < 0 || (old_max >= 0 && old_max <= new_max))
+ 			ret = relabel_to_typmod(source, new_typmod);
+ 	}
+ 
+ 	PG_RETURN_POINTER(ret);
+ }
+ 
+ /*
   * Converts a VARCHAR type to the specified size.
   *
   * maxlen is the typmod, ie, declared length plus VARHDRSZ bytes.
diff --git a/src/include/catalog/pg_clindex ffcce3c..002ae6b 100644
*** a/src/include/catalog/pg_class.h
--- b/src/include/catalog/pg_class.h
***************
*** 134,140 **** DATA(insert OID = 1247 (  pg_type		PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 29 0 t
  DESCR("");
  DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 20 0 f f f f f 3 _null_ _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 25 0 t f f f f 3 _null_ _null_ ));
  DESCR("");
  DATA(insert OID = 1259 (  pg_class		PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 26 0 t f f f f 3 _null_ _null_ ));
  DESCR("");
--- 134,140 ----
  DESCR("");
  DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 20 0 f f f f f 3 _null_ _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 26 0 t f f f f 3 _null_ _null_ ));
  DESCR("");
  DATA(insert OID = 1259 (  pg_class		PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 26 0 t f f f f 3 _null_ _null_ ));
  DESCR("");
diff --git a/src/include/catalog/pg_pindex a66ecf1..6980d3e 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***************
*** 42,47 **** CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
--- 42,48 ----
  	float4		procost;		/* estimated execution cost */
  	float4		prorows;		/* estimated # of rows out (if proretset) */
  	Oid			provariadic;	/* element type of variadic array, or 0 */
+ 	regproc		protransform;	/* transforms calls to it during planning */
  	bool		proisagg;		/* is it an aggregate? */
  	bool		proiswindow;	/* is it a window function? */
  	bool		prosecdef;		/* security definer */
***************
*** 76,82 **** typedef FormData_pg_proc *Form_pg_proc;
   *		compiler constants for pg_proc
   * ----------------
   */
! #define Natts_pg_proc					25
  #define Anum_pg_proc_proname			1
  #define Anum_pg_proc_pronamespace		2
  #define Anum_pg_proc_proowner			3
--- 77,83 ----
   *		compiler constants for pg_proc
   * ----------------
   */
! #define Natts_pg_proc					26
  #define Anum_pg_proc_proname			1
  #define Anum_pg_proc_pronamespace		2
  #define Anum_pg_proc_proowner			3
***************
*** 84,107 **** typedef FormData_pg_proc *Form_pg_proc;
  #define Anum_pg_proc_procost			5
  #define Anum_pg_proc_prorows			6
  #define Anum_pg_proc_provariadic		7
! #define Anum_pg_proc_proisagg			8
! #define Anum_pg_proc_proiswindow		9
! #define Anum_pg_proc_prosecdef			10
! #define Anum_pg_proc_proisstrict		11
! #define Anum_pg_proc_proretset			12
! #define Anum_pg_proc_provolatile		13
! #define Anum_pg_proc_pronargs			14
! #define Anum_pg_proc_pronargdefaults	15
! #define Anum_pg_proc_prorettype			16
! #define Anum_pg_proc_proargtypes		17
! #define Anum_pg_proc_proallargtypes		18
! #define Anum_pg_proc_proargmodes		19
! #define Anum_pg_proc_proargnames		20
! #define Anum_pg_proc_proargdefaults		21
! #define Anum_pg_proc_prosrc				22
! #define Anum_pg_proc_probin				23
! #define Anum_pg_proc_proconfig			24
! #define Anum_pg_proc_proacl				25
  
  /* ----------------
   *		initial contents of pg_proc
--- 85,109 ----
  #define Anum_pg_proc_procost			5
  #define Anum_pg_proc_prorows			6
  #define Anum_pg_proc_provariadic		7
! #define Anum_pg_proc_protransform		8
! #define Anum_pg_proc_proisagg			9
! #define Anum_pg_proc_proiswindow		10
! #define Anum_pg_proc_prosecdef			11
! #define Anum_pg_proc_proisstrict		12
! #define Anum_pg_proc_proretset			13
! #define Anum_pg_proc_provolatile		14
! #define Anum_pg_proc_pronargs			15
! #define Anum_pg_proc_pronargdefaults	16
! #define Anum_pg_proc_prorettype			17
! #define Anum_pg_proc_proargtypes		18
! #define Anum_pg_proc_proallargtypes		19
! #define Anum_pg_proc_proargmodes		20
! #define Anum_pg_proc_proargnames		21
! #define Anum_pg_proc_proargdefaults		22
! #define Anum_pg_proc_prosrc				23
! #define Anum_pg_proc_probin				24
! #define Anum_pg_proc_proconfig			25
! #define Anum_pg_proc_proacl				26
  
  /* ----------------
   *		initial contents of pg_proc
***************
*** 743,749 **** DATA(insert OID = 659 (  namene			   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 16 "
  
  DATA(insert OID = 668 (  bpchar			   PGNSP PGUID 12 1 0 0 0 f f f t f i 3 0 1042 "1042 23 16" _null_ _null_ _null_ _null_ bpchar _null_ _null_ _null_ ));
  DESCR("adjust char() to typmod length");
! DATA(insert OID = 669 (  varchar		   PGNSP PGUID 12 1 0 0 0 f f f t f i 3 0 1043 "1043 23 16" _null_ _null_ _null_ _null_ varchar _null_ _null_ _null_ ));
  DESCR("adjust varchar() to typmod length");
  
  DATA(insert OID = 676 (  mktinterval	   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 704 "702 702" _null_ _null_ _null_ _null_ mktinterval _null_ _null_ _null_ ));
--- 745,753 ----
  
  DATA(insert OID = 668 (  bpchar			   PGNSP PGUID 12 1 0 0 0 f f f t f i 3 0 1042 "1042 23 16" _null_ _null_ _null_ _null_ bpchar _null_ _null_ _null_ ));
  DESCR("adjust char() to typmod length");
! DATA(insert OID = 3097 ( varchar_transform PGNSP PGUID 12 1 0 0 0 f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ varchar_transform _null_ _null_ _null_ ));
! DESCR("transform a varchar length coercion");
! DATA(insert OID = 669 (  varchar		   PGNSP PGUID 12 1 0 0 3097 f f f t f i 3 0 1043 "1043 23 16" _null_ _null_ _null_ _null_ varchar _null_ _null_ _null_ ));
  DESCR("adjust varchar() to typmod length");
  
  DATA(insert OID = 676 (  mktinterval	   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 704 "702 702" _null_ _null_ _null_ _null_ mktinterval _null_ _null_ _null_ ));
diff --git a/src/include/parser/parsindex 09ef450..f5ce2ff 100644
*** a/src/include/parser/parse_clause.h
--- b/src/include/parser/parse_clause.h
***************
*** 44,47 **** extern List *transformDistinctOnClause(ParseState *pstate, List *distinctlist,
--- 44,49 ----
  extern Index assignSortGroupRef(TargetEntry *tle, List *tlist);
  extern bool targetIsInSortList(TargetEntry *tle, Oid sortop, List *sortList);
  
+ extern Node *relabel_to_typmod(Node *expr, int32 typmod);
+ 
  #endif   /* PARSE_CLAUSE_H */
diff --git a/src/include/utils/builtins.index 14215db..8a1c82e 100644
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
***************
*** 684,689 **** extern Datum varcharrecv(PG_FUNCTION_ARGS);
--- 684,690 ----
  extern Datum varcharsend(PG_FUNCTION_ARGS);
  extern Datum varchartypmodin(PG_FUNCTION_ARGS);
  extern Datum varchartypmodout(PG_FUNCTION_ARGS);
+ extern Datum varchar_transform(PG_FUNCTION_ARGS);
  extern Datum varchar(PG_FUNCTION_ARGS);
  
  /* varlena.c */
diff --git a/src/test/regress/expecindex 3326f93..1b5a878 100644
#26Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#25)
Re: Identifying no-op length coercions

On Sat, Jun 11, 2011 at 5:13 PM, Noah Misch <noah@leadboat.com> wrote:

Sounds good.  Updated patch attached, incorporating your ideas.  Before applying
it, run this command to update the uninvolved pg_proc.h DATA entries:
 perl -pi -e 's/PGUID(\s+\d+){4}/$& 0/' src/include/catalog/pg_proc.h

This doesn't quite apply any more. I think the pgindent run broke it slightly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#26)
Re: Identifying no-op length coercions

On Sat, Jun 18, 2011 at 10:57:13PM -0400, Robert Haas wrote:

On Sat, Jun 11, 2011 at 5:13 PM, Noah Misch <noah@leadboat.com> wrote:

Sounds good. ?Updated patch attached, incorporating your ideas. ?Before applying
it, run this command to update the uninvolved pg_proc.h DATA entries:
?perl -pi -e 's/PGUID(\s+\d+){4}/$& 0/' src/include/catalog/pg_proc.h

This doesn't quite apply any more. I think the pgindent run broke it slightly.

Hmm, I just get two one-line offsets when applying it to current master. Note
that you need to run the perl invocation before applying the patch. Could you
provide full output of your `patch' invocation, along with any reject files?

Thanks,
nm

#28Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#27)
Re: Identifying no-op length coercions

On Sat, Jun 18, 2011 at 11:06 PM, Noah Misch <noah@leadboat.com> wrote:

On Sat, Jun 18, 2011 at 10:57:13PM -0400, Robert Haas wrote:

On Sat, Jun 11, 2011 at 5:13 PM, Noah Misch <noah@leadboat.com> wrote:

Sounds good. ?Updated patch attached, incorporating your ideas. ?Before applying
it, run this command to update the uninvolved pg_proc.h DATA entries:
?perl -pi -e 's/PGUID(\s+\d+){4}/$& 0/' src/include/catalog/pg_proc.h

This doesn't quite apply any more.  I think the pgindent run broke it slightly.

Hmm, I just get two one-line offsets when applying it to current master.  Note
that you need to run the perl invocation before applying the patch.  Could you
provide full output of your `patch' invocation, along with any reject files?

Ah, crap. You're right. I didn't follow your directions for how to
apply the patch. Sorry.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#29Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#28)
Re: Identifying no-op length coercions

On Sat, Jun 18, 2011 at 11:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jun 18, 2011 at 11:06 PM, Noah Misch <noah@leadboat.com> wrote:

On Sat, Jun 18, 2011 at 10:57:13PM -0400, Robert Haas wrote:

On Sat, Jun 11, 2011 at 5:13 PM, Noah Misch <noah@leadboat.com> wrote:

Sounds good. ?Updated patch attached, incorporating your ideas. ?Before applying
it, run this command to update the uninvolved pg_proc.h DATA entries:
?perl -pi -e 's/PGUID(\s+\d+){4}/$& 0/' src/include/catalog/pg_proc.h

This doesn't quite apply any more.  I think the pgindent run broke it slightly.

Hmm, I just get two one-line offsets when applying it to current master.  Note
that you need to run the perl invocation before applying the patch.  Could you
provide full output of your `patch' invocation, along with any reject files?

Ah, crap.  You're right.  I didn't follow your directions for how to
apply the patch.  Sorry.

I think you need to update the comment in simplify_function() to say
that we have three strategies, rather than two. I think it would also
be appropriate to add a longish comment just before the test that
calls protransform, explaining what the charter of that function is
and why the mechanism exists.

Documentation issues aside, I see very little not to like about this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#30Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#29)
1 attachment(s)
Re: Identifying no-op length coercions

On Sat, Jun 18, 2011 at 11:32:20PM -0400, Robert Haas wrote:

On Sat, Jun 18, 2011 at 11:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jun 18, 2011 at 11:06 PM, Noah Misch <noah@leadboat.com> wrote:

On Sat, Jun 18, 2011 at 10:57:13PM -0400, Robert Haas wrote:

On Sat, Jun 11, 2011 at 5:13 PM, Noah Misch <noah@leadboat.com> wrote:

Sounds good. ?Updated patch attached, incorporating your ideas. ?Before applying
it, run this command to update the uninvolved pg_proc.h DATA entries:
?perl -pi -e 's/PGUID(\s+\d+){4}/$& 0/' src/include/catalog/pg_proc.h

This doesn't quite apply any more. ?I think the pgindent run broke it slightly.

Hmm, I just get two one-line offsets when applying it to current master. ?Note
that you need to run the perl invocation before applying the patch. ?Could you
provide full output of your `patch' invocation, along with any reject files?

Ah, crap. ?You're right. ?I didn't follow your directions for how to
apply the patch. ?Sorry.

I think you need to update the comment in simplify_function() to say
that we have three strategies, rather than two. I think it would also
be appropriate to add a longish comment just before the test that
calls protransform, explaining what the charter of that function is
and why the mechanism exists.

Good idea. See attached.

Documentation issues aside, I see very little not to like about this.

Great! Thanks for reviewing.

Attachments:

noop-length-coercion-v3.patchtext/plain; charset=us-asciiDownload
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 24d7d98..7a380ce 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 4338,4343 ****
--- 4338,4350 ----
       </row>
  
       <row>
+       <entry><structfield>protransform</structfield></entry>
+       <entry><type>regproc</type></entry>
+       <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
+       <entry>Calls to function can be simplified by this other function</entry>
+      </row>
+ 
+      <row>
        <entry><structfield>proisagg</structfield></entry>
        <entry><type>bool</type></entry>
        <entry></entry>
diff --git a/src/backend/catalog/index 6250b07..92be0a7 100644
*** a/src/backend/catalog/pg_proc.c
--- b/src/backend/catalog/pg_proc.c
***************
*** 304,309 **** ProcedureCreate(const char *procedureName,
--- 304,310 ----
  	values[Anum_pg_proc_procost - 1] = Float4GetDatum(procost);
  	values[Anum_pg_proc_prorows - 1] = Float4GetDatum(prorows);
  	values[Anum_pg_proc_provariadic - 1] = ObjectIdGetDatum(variadicType);
+ 	values[Anum_pg_proc_protransform - 1] = ObjectIdGetDatum(InvalidOid);
  	values[Anum_pg_proc_proisagg - 1] = BoolGetDatum(isAgg);
  	values[Anum_pg_proc_proiswindow - 1] = BoolGetDatum(isWindowFunc);
  	values[Anum_pg_proc_prosecdef - 1] = BoolGetDatum(security_definer);
diff --git a/src/backend/commands/taindex 912f45c..4dffd64 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 56,61 ****
--- 56,62 ----
  #include "nodes/nodeFuncs.h"
  #include "nodes/parsenodes.h"
  #include "optimizer/clauses.h"
+ #include "optimizer/planner.h"
  #include "parser/parse_clause.h"
  #include "parser/parse_coerce.h"
  #include "parser/parse_collate.h"
***************
*** 3495,3501 **** ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
  	{
  		NewColumnValue *ex = lfirst(l);
  
! 		ex->exprstate = ExecPrepareExpr((Expr *) ex->expr, estate);
  	}
  
  	notnull_attrs = NIL;
--- 3496,3503 ----
  	{
  		NewColumnValue *ex = lfirst(l);
  
! 		/* expr already planned */
! 		ex->exprstate = ExecInitExpr((Expr *) ex->expr, NULL);
  	}
  
  	notnull_attrs = NIL;
***************
*** 4398,4404 **** ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
  
  			newval = (NewColumnValue *) palloc0(sizeof(NewColumnValue));
  			newval->attnum = attribute.attnum;
! 			newval->expr = defval;
  
  			tab->newvals = lappend(tab->newvals, newval);
  			tab->rewrite = true;
--- 4400,4406 ----
  
  			newval = (NewColumnValue *) palloc0(sizeof(NewColumnValue));
  			newval->attnum = attribute.attnum;
! 			newval->expr = expression_planner(defval);
  
  			tab->newvals = lappend(tab->newvals, newval);
  			tab->rewrite = true;
***************
*** 6707,6712 **** ATPrepAlterColumnType(List **wqueue,
--- 6709,6717 ----
  		/* Fix collations after all else */
  		assign_expr_collations(pstate, transform);
  
+ 		/* Plan the expr now so we can accurately assess the need to rewrite. */
+ 		transform = (Node *) expression_planner((Expr *) transform);
+ 
  		/*
  		 * Add a work queue item to make ATRewriteTable update the column
  		 * contents.
diff --git a/src/backend/optimizer/utilindex 2914c39..be0935d 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
***************
*** 106,114 **** static List *simplify_and_arguments(List *args,
  					   eval_const_expressions_context *context,
  					   bool *haveNull, bool *forceFalse);
  static Node *simplify_boolean_equality(Oid opno, List *args);
! static Expr *simplify_function(Oid funcid,
! 				  Oid result_type, int32 result_typmod,
! 				  Oid result_collid, Oid input_collid, List **args,
  				  bool has_named_args,
  				  bool allow_inline,
  				  eval_const_expressions_context *context);
--- 106,114 ----
  					   eval_const_expressions_context *context,
  					   bool *haveNull, bool *forceFalse);
  static Node *simplify_boolean_equality(Oid opno, List *args);
! static Expr *simplify_function(Expr *oldexpr, Oid funcid,
! 				  Oid result_type, int32 result_typmod, Oid result_collid,
! 				  Oid input_collid, List **args,
  				  bool has_named_args,
  				  bool allow_inline,
  				  eval_const_expressions_context *context);
***************
*** 2223,2229 **** eval_const_expressions_mutator(Node *node,
  		 * FuncExpr, but not when the node is recognizably a length coercion;
  		 * we want to preserve the typmod in the eventual Const if so.
  		 */
! 		simple = simplify_function(expr->funcid,
  								   expr->funcresulttype, exprTypmod(node),
  								   expr->funccollid,
  								   expr->inputcollid,
--- 2223,2230 ----
  		 * FuncExpr, but not when the node is recognizably a length coercion;
  		 * we want to preserve the typmod in the eventual Const if so.
  		 */
! 		simple = simplify_function((Expr *) expr,
! 								   expr->funcid,
  								   expr->funcresulttype, exprTypmod(node),
  								   expr->funccollid,
  								   expr->inputcollid,
***************
*** 2275,2281 **** eval_const_expressions_mutator(Node *node,
  		 * Code for op/func reduction is pretty bulky, so split it out as a
  		 * separate function.
  		 */
! 		simple = simplify_function(expr->opfuncid,
  								   expr->opresulttype, -1,
  								   expr->opcollid,
  								   expr->inputcollid,
--- 2276,2283 ----
  		 * Code for op/func reduction is pretty bulky, so split it out as a
  		 * separate function.
  		 */
! 		simple = simplify_function((Expr *) expr,
! 								   expr->opfuncid,
  								   expr->opresulttype, -1,
  								   expr->opcollid,
  								   expr->inputcollid,
***************
*** 2372,2378 **** eval_const_expressions_mutator(Node *node,
  			 * Code for op/func reduction is pretty bulky, so split it out as
  			 * a separate function.
  			 */
! 			simple = simplify_function(expr->opfuncid,
  									   expr->opresulttype, -1,
  									   expr->opcollid,
  									   expr->inputcollid,
--- 2374,2381 ----
  			 * Code for op/func reduction is pretty bulky, so split it out as
  			 * a separate function.
  			 */
! 			simple = simplify_function((Expr *) expr,
! 									   expr->opfuncid,
  									   expr->opresulttype, -1,
  									   expr->opcollid,
  									   expr->inputcollid,
***************
*** 2561,2567 **** eval_const_expressions_mutator(Node *node,
  		getTypeOutputInfo(exprType((Node *) arg), &outfunc, &outtypisvarlena);
  		getTypeInputInfo(expr->resulttype, &infunc, &intypioparam);
  
! 		simple = simplify_function(outfunc,
  								   CSTRINGOID, -1,
  								   InvalidOid,
  								   InvalidOid,
--- 2564,2571 ----
  		getTypeOutputInfo(exprType((Node *) arg), &outfunc, &outtypisvarlena);
  		getTypeInputInfo(expr->resulttype, &infunc, &intypioparam);
  
! 		simple = simplify_function(NULL,
! 								   outfunc,
  								   CSTRINGOID, -1,
  								   InvalidOid,
  								   InvalidOid,
***************
*** 2581,2587 **** eval_const_expressions_mutator(Node *node,
  									  Int32GetDatum(-1),
  									  false, true));
  
! 			simple = simplify_function(infunc,
  									   expr->resulttype, -1,
  									   expr->resultcollid,
  									   InvalidOid,
--- 2585,2592 ----
  									  Int32GetDatum(-1),
  									  false, true));
  
! 			simple = simplify_function(NULL,
! 									   infunc,
  									   expr->resulttype, -1,
  									   expr->resultcollid,
  									   InvalidOid,
***************
*** 3417,3427 **** simplify_boolean_equality(Oid opno, List *args)
   * Subroutine for eval_const_expressions: try to simplify a function call
   * (which might originally have been an operator; we don't care)
   *
!  * Inputs are the function OID, actual result type OID (which is needed for
!  * polymorphic functions), result typmod, result collation,
!  * the input collation to use for the function,
!  * the pre-simplified argument list, and some flags;
!  * also the context data for eval_const_expressions.
   *
   * Returns a simplified expression if successful, or NULL if cannot
   * simplify the function call.
--- 3422,3436 ----
   * Subroutine for eval_const_expressions: try to simplify a function call
   * (which might originally have been an operator; we don't care)
   *
!  * Inputs are the original expression (can be NULL), function OID, actual
!  * result type OID (which is needed for polymorphic functions), result typmod,
!  * result collation, the input collation to use for the function, the
!  * pre-simplified argument list, and some flags; also the context data for
!  * eval_const_expressions.  In common cases, several of the arguments could be
!  * derived from the original expression.  Sending them separately avoids
!  * duplicating NodeTag-specific knowledge, and it's necessary for CoerceViaIO.
!  * A NULL original expression disables use of transform functions while
!  * retaining all other behaviors.
   *
   * Returns a simplified expression if successful, or NULL if cannot
   * simplify the function call.
***************
*** 3433,3454 **** simplify_boolean_equality(Oid opno, List *args)
   * pass-by-reference, and it may get modified even if simplification fails.
   */
  static Expr *
! simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
! 				  Oid result_collid, Oid input_collid, List **args,
  				  bool has_named_args,
  				  bool allow_inline,
  				  eval_const_expressions_context *context)
  {
  	HeapTuple	func_tuple;
  	Expr	   *newexpr;
  
  	/*
! 	 * We have two strategies for simplification: either execute the function
! 	 * to deliver a constant result, or expand in-line the body of the
! 	 * function definition (which only works for simple SQL-language
! 	 * functions, but that is a common case).  In either case we need access
! 	 * to the function's pg_proc tuple, so fetch it just once to use in both
! 	 * attempts.
  	 */
  	func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
  	if (!HeapTupleIsValid(func_tuple))
--- 3442,3465 ----
   * pass-by-reference, and it may get modified even if simplification fails.
   */
  static Expr *
! simplify_function(Expr *oldexpr, Oid funcid,
! 				  Oid result_type, int32 result_typmod, Oid result_collid,
! 				  Oid input_collid, List **args,
  				  bool has_named_args,
  				  bool allow_inline,
  				  eval_const_expressions_context *context)
  {
  	HeapTuple	func_tuple;
  	Expr	   *newexpr;
+ 	Oid			transform;
  
  	/*
! 	 * We have three strategies for simplification: execute the function to
! 	 * deliver a constant result, use a transform function to generate a
! 	 * substitute node tree, or expand in-line the body of the function
! 	 * definition (which only works for simple SQL-language functions, but
! 	 * that is a common case).  Each needs access to the function's pg_proc
! 	 * tuple, so fetch it just once.
  	 */
  	func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
  	if (!HeapTupleIsValid(func_tuple))
***************
*** 3468,3473 **** simplify_function(Oid funcid, Oid result_type, int32 result_typmod,
--- 3479,3518 ----
  								result_collid, input_collid, *args,
  								func_tuple, context);
  
+ 	/*
+ 	 * Some functions calls can be simplified at plan time based on properties
+ 	 * specific to the function.  For example, "varchar(s::varchar(4), 8,
+ 	 * true)" simplifies to "s::varchar(4)", and "int4mul(n, 1)" could
+ 	 * simplify to "n".  To define such function-specific optimizations, write
+ 	 * a "transform function" and store its OID in the pg_proc.protransform of
+ 	 * the primary function.  Give each transform function the signature
+ 	 * "protransform(internal) RETURNS internal".  The argument, internally an
+ 	 * Expr *, is the node representing a call to the primary function.  If
+ 	 * the transform function's study of that node proves that a simplified
+ 	 * Expr substitutes for all possible concrete calls represented thereby,
+ 	 * return that simplified Expr.  Otherwise, return the NULL pointer.
+ 	 *
+ 	 * Currently, the specific Expr nodetag can be FuncExpr, OpExpr or
+ 	 * DistinctExpr.  This list may change in the future.  The function should
+ 	 * check the nodetag and return the NULL pointer for unexpected inputs.
+ 	 *
+ 	 * We make no guarantee that PostgreSQL will never call the primary
+ 	 * function in cases that the transform function would simplify.  Ensure
+ 	 * rigorous equivalence between the simplified expression and an actual
+ 	 * call to the primary function.
+ 	 *
+ 	 * Currently, this facility is undocumented and not exposed to users at
+ 	 * the SQL level.  Core length coercion casts use it to avoid calls
+ 	 * guaranteed to return their input unchanged.  This in turn allows ALTER
+ 	 * TABLE ALTER TYPE to avoid rewriting tables for some typmod changes.  In
+ 	 * the future, this facility may find other applications, like simplifying
+ 	 * x*0, x*1, and x+0.
+ 	 */
+ 	transform = ((Form_pg_proc) GETSTRUCT(func_tuple))->protransform;
+ 	if (!newexpr && OidIsValid(transform) && oldexpr)
+ 		newexpr = (Expr *) DatumGetPointer(OidFunctionCall1(transform,
+ 															PointerGetDatum(oldexpr)));
+ 
  	if (!newexpr && allow_inline)
  		newexpr = inline_function(funcid, result_type, result_collid,
  								  input_collid, *args,
diff --git a/src/backend/parser/parse_clausindex c5fe6b6..a8549e0 100644
*** a/src/backend/parser/parse_clause.c
--- b/src/backend/parser/parse_clause.c
***************
*** 2278,2280 **** transformFrameOffset(ParseState *pstate, int frameOptions, Node *clause)
--- 2278,2302 ----
  
  	return node;
  }
+ 
+ /*
+  * relabel_to_typmod
+  *		Add a RelabelType node that changes just the typmod, and remove all
+  *		now-superfluous RelabelType nodes beneath it.
+  */
+ Node *
+ relabel_to_typmod(Node *expr, int32 typmod)
+ {
+ 	Oid			type = exprType(expr);
+ 	Oid			coll = exprCollation(expr);
+ 
+ 	/*
+ 	 * Strip any existing RelabelType, then add one. This is to preserve the
+ 	 * invariant of no redundant RelabelTypes.
+ 	 */
+ 	while (IsA(expr, RelabelType))
+ 		expr = (Node *) ((RelabelType *) expr)->arg;
+ 
+ 	return (Node *) makeRelabelType((Expr *) expr, type, typmod, coll,
+ 									COERCE_DONTCARE);
+ }
diff --git a/src/backend/utils/adt/varchindex 1c0ef92..efa483d 100644
*** a/src/backend/utils/adt/varchar.c
--- b/src/backend/utils/adt/varchar.c
***************
*** 18,23 ****
--- 18,25 ----
  #include "access/hash.h"
  #include "access/tuptoaster.h"
  #include "libpq/pqformat.h"
+ #include "nodes/nodeFuncs.h"
+ #include "parser/parse_clause.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "mb/pg_wchar.h"
***************
*** 549,554 **** varcharsend(PG_FUNCTION_ARGS)
--- 551,588 ----
  
  
  /*
+  * Flatten calls to our length coercion function that leave the new maximum
+  * length >= the previous maximum length.  We ignore the isExplicit argument,
+  * which only affects truncation.
+  */
+ Datum
+ varchar_transform(PG_FUNCTION_ARGS)
+ {
+ 	FuncExpr   *expr = (FuncExpr *) PG_GETARG_POINTER(0);
+ 	Node	   *typmod;
+ 	Node	   *ret = NULL;
+ 
+ 	if (!IsA(expr, FuncExpr))
+ 		PG_RETURN_POINTER(ret);
+ 
+ 	Assert(list_length(expr->args) == 3);
+ 	typmod = lsecond(expr->args);
+ 
+ 	if (IsA(typmod, Const))
+ 	{
+ 		Node	   *source = linitial(expr->args);
+ 		int32		new_typmod = DatumGetInt32(((Const *) typmod)->constvalue);
+ 		int32		old_max = exprTypmod(source) - VARHDRSZ;
+ 		int32		new_max = new_typmod - VARHDRSZ;
+ 
+ 		if (new_max < 0 || (old_max >= 0 && old_max <= new_max))
+ 			ret = relabel_to_typmod(source, new_typmod);
+ 	}
+ 
+ 	PG_RETURN_POINTER(ret);
+ }
+ 
+ /*
   * Converts a VARCHAR type to the specified size.
   *
   * maxlen is the typmod, ie, declared length plus VARHDRSZ bytes.
diff --git a/src/include/catalog/pg_clindex ffcce3c..002ae6b 100644
*** a/src/include/catalog/pg_class.h
--- b/src/include/catalog/pg_class.h
***************
*** 134,140 **** DATA(insert OID = 1247 (  pg_type		PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 29 0 t
  DESCR("");
  DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 20 0 f f f f f 3 _null_ _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 25 0 t f f f f 3 _null_ _null_ ));
  DESCR("");
  DATA(insert OID = 1259 (  pg_class		PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 26 0 t f f f f 3 _null_ _null_ ));
  DESCR("");
--- 134,140 ----
  DESCR("");
  DATA(insert OID = 1249 (  pg_attribute	PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 20 0 f f f f f 3 _null_ _null_ ));
  DESCR("");
! DATA(insert OID = 1255 (  pg_proc		PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 26 0 t f f f f 3 _null_ _null_ ));
  DESCR("");
  DATA(insert OID = 1259 (  pg_class		PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 26 0 t f f f f 3 _null_ _null_ ));
  DESCR("");
diff --git a/src/include/catalog/pg_pindex a66ecf1..6980d3e 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***************
*** 42,47 **** CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
--- 42,48 ----
  	float4		procost;		/* estimated execution cost */
  	float4		prorows;		/* estimated # of rows out (if proretset) */
  	Oid			provariadic;	/* element type of variadic array, or 0 */
+ 	regproc		protransform;	/* transforms calls to it during planning */
  	bool		proisagg;		/* is it an aggregate? */
  	bool		proiswindow;	/* is it a window function? */
  	bool		prosecdef;		/* security definer */
***************
*** 76,82 **** typedef FormData_pg_proc *Form_pg_proc;
   *		compiler constants for pg_proc
   * ----------------
   */
! #define Natts_pg_proc					25
  #define Anum_pg_proc_proname			1
  #define Anum_pg_proc_pronamespace		2
  #define Anum_pg_proc_proowner			3
--- 77,83 ----
   *		compiler constants for pg_proc
   * ----------------
   */
! #define Natts_pg_proc					26
  #define Anum_pg_proc_proname			1
  #define Anum_pg_proc_pronamespace		2
  #define Anum_pg_proc_proowner			3
***************
*** 84,107 **** typedef FormData_pg_proc *Form_pg_proc;
  #define Anum_pg_proc_procost			5
  #define Anum_pg_proc_prorows			6
  #define Anum_pg_proc_provariadic		7
! #define Anum_pg_proc_proisagg			8
! #define Anum_pg_proc_proiswindow		9
! #define Anum_pg_proc_prosecdef			10
! #define Anum_pg_proc_proisstrict		11
! #define Anum_pg_proc_proretset			12
! #define Anum_pg_proc_provolatile		13
! #define Anum_pg_proc_pronargs			14
! #define Anum_pg_proc_pronargdefaults	15
! #define Anum_pg_proc_prorettype			16
! #define Anum_pg_proc_proargtypes		17
! #define Anum_pg_proc_proallargtypes		18
! #define Anum_pg_proc_proargmodes		19
! #define Anum_pg_proc_proargnames		20
! #define Anum_pg_proc_proargdefaults		21
! #define Anum_pg_proc_prosrc				22
! #define Anum_pg_proc_probin				23
! #define Anum_pg_proc_proconfig			24
! #define Anum_pg_proc_proacl				25
  
  /* ----------------
   *		initial contents of pg_proc
--- 85,109 ----
  #define Anum_pg_proc_procost			5
  #define Anum_pg_proc_prorows			6
  #define Anum_pg_proc_provariadic		7
! #define Anum_pg_proc_protransform		8
! #define Anum_pg_proc_proisagg			9
! #define Anum_pg_proc_proiswindow		10
! #define Anum_pg_proc_prosecdef			11
! #define Anum_pg_proc_proisstrict		12
! #define Anum_pg_proc_proretset			13
! #define Anum_pg_proc_provolatile		14
! #define Anum_pg_proc_pronargs			15
! #define Anum_pg_proc_pronargdefaults	16
! #define Anum_pg_proc_prorettype			17
! #define Anum_pg_proc_proargtypes		18
! #define Anum_pg_proc_proallargtypes		19
! #define Anum_pg_proc_proargmodes		20
! #define Anum_pg_proc_proargnames		21
! #define Anum_pg_proc_proargdefaults		22
! #define Anum_pg_proc_prosrc				23
! #define Anum_pg_proc_probin				24
! #define Anum_pg_proc_proconfig			25
! #define Anum_pg_proc_proacl				26
  
  /* ----------------
   *		initial contents of pg_proc
***************
*** 743,749 **** DATA(insert OID = 659 (  namene			   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 16 "
  
  DATA(insert OID = 668 (  bpchar			   PGNSP PGUID 12 1 0 0 0 f f f t f i 3 0 1042 "1042 23 16" _null_ _null_ _null_ _null_ bpchar _null_ _null_ _null_ ));
  DESCR("adjust char() to typmod length");
! DATA(insert OID = 669 (  varchar		   PGNSP PGUID 12 1 0 0 0 f f f t f i 3 0 1043 "1043 23 16" _null_ _null_ _null_ _null_ varchar _null_ _null_ _null_ ));
  DESCR("adjust varchar() to typmod length");
  
  DATA(insert OID = 676 (  mktinterval	   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 704 "702 702" _null_ _null_ _null_ _null_ mktinterval _null_ _null_ _null_ ));
--- 745,753 ----
  
  DATA(insert OID = 668 (  bpchar			   PGNSP PGUID 12 1 0 0 0 f f f t f i 3 0 1042 "1042 23 16" _null_ _null_ _null_ _null_ bpchar _null_ _null_ _null_ ));
  DESCR("adjust char() to typmod length");
! DATA(insert OID = 3097 ( varchar_transform PGNSP PGUID 12 1 0 0 0 f f f t f i 1 0 2281 "2281" _null_ _null_ _null_ _null_ varchar_transform _null_ _null_ _null_ ));
! DESCR("transform a varchar length coercion");
! DATA(insert OID = 669 (  varchar		   PGNSP PGUID 12 1 0 0 3097 f f f t f i 3 0 1043 "1043 23 16" _null_ _null_ _null_ _null_ varchar _null_ _null_ _null_ ));
  DESCR("adjust varchar() to typmod length");
  
  DATA(insert OID = 676 (  mktinterval	   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 704 "702 702" _null_ _null_ _null_ _null_ mktinterval _null_ _null_ _null_ ));
diff --git a/src/include/parser/parsindex 09ef450..f5ce2ff 100644
*** a/src/include/parser/parse_clause.h
--- b/src/include/parser/parse_clause.h
***************
*** 44,47 **** extern List *transformDistinctOnClause(ParseState *pstate, List *distinctlist,
--- 44,49 ----
  extern Index assignSortGroupRef(TargetEntry *tle, List *tlist);
  extern bool targetIsInSortList(TargetEntry *tle, Oid sortop, List *sortList);
  
+ extern Node *relabel_to_typmod(Node *expr, int32 typmod);
+ 
  #endif   /* PARSE_CLAUSE_H */
diff --git a/src/include/utils/builtins.index 14215db..8a1c82e 100644
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
***************
*** 684,689 **** extern Datum varcharrecv(PG_FUNCTION_ARGS);
--- 684,690 ----
  extern Datum varcharsend(PG_FUNCTION_ARGS);
  extern Datum varchartypmodin(PG_FUNCTION_ARGS);
  extern Datum varchartypmodout(PG_FUNCTION_ARGS);
+ extern Datum varchar_transform(PG_FUNCTION_ARGS);
  extern Datum varchar(PG_FUNCTION_ARGS);
  
  /* varlena.c */
diff --git a/src/test/regress/expecindex 9171f04..8e75b14 100644
#31Alexey Klyukin
alexk@commandprompt.com
In reply to: Noah Misch (#30)
Re: Identifying no-op length coercions

Hi,

On Jun 19, 2011, at 2:10 PM, Noah Misch wrote:

On Sat, Jun 18, 2011 at 11:32:20PM -0400, Robert Haas wrote:

On Sat, Jun 18, 2011 at 11:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jun 18, 2011 at 11:06 PM, Noah Misch <noah@leadboat.com> wrote:

On Sat, Jun 18, 2011 at 10:57:13PM -0400, Robert Haas wrote:

On Sat, Jun 11, 2011 at 5:13 PM, Noah Misch <noah@leadboat.com> wrote:

Sounds good. ?Updated patch attached, incorporating your ideas. ?Before applying
it, run this command to update the uninvolved pg_proc.h DATA entries:
?perl -pi -e 's/PGUID(\s+\d+){4}/$& 0/' src/include/catalog/pg_proc.h

This doesn't quite apply any more. ?I think the pgindent run broke it slightly.

Hmm, I just get two one-line offsets when applying it to current master. ?Note
that you need to run the perl invocation before applying the patch. ?Could you
provide full output of your `patch' invocation, along with any reject files?

Ah, crap. ?You're right. ?I didn't follow your directions for how to
apply the patch. ?Sorry.

I think you need to update the comment in simplify_function() to say
that we have three strategies, rather than two. I think it would also
be appropriate to add a longish comment just before the test that
calls protransform, explaining what the charter of that function is
and why the mechanism exists.

Good idea. See attached.

Documentation issues aside, I see very little not to like about this.

Great! Thanks for reviewing.
<noop-length-coercion-v3.patch>

Here is my review of this patch.

The patch applies cleanly to the HEAD, produces no additional warnings. It
doesn't include additional regression tests. One can include a test, using the
commands like the ones included below.

Changes to the documentation are limited to the a description of a new
pg_class attribute. It would probably make sense to document all the
exceptions to the table's rewrite on ALTER TABLE documentation page, although
it could wait for more such exceptions.

The feature works as intended and I haven't noticed any crashes or assertion failures, i.e.

postgres=# create table test(id integer, name varchar);
CREATE TABLE
postgres=# insert into test values(1, 'test');
INSERT 0 1
postgres=# select relfilenode from pg_class where oid='test'::regclass;
relfilenode
-------------
66302
(1 row)
postgres=# alter table test alter column name type varchar(10);
ALTER TABLE
postgres=# select relfilenode from pg_class where oid='test'::regclass;
relfilenode
-------------
66308
(1 row)
postgres=# alter table test alter column name type varchar(65535);
ALTER TABLE
postgres=# select relfilenode from pg_class where oid='test'::regclass;
relfilenode
-------------
66308
(1 row)

The code looks good and takes into account all the previous suggestions.

The only nitpick code-wise is these lines in varchar_transform:

+ 		int32		old_max = exprTypmod(source) - VARHDRSZ;
+ 		int32		new_max = new_typmod - VARHDRSZ;

I have a hard time understanding why VARHDRSZ is subtracted here, so I'd assume that's a bug.

Other than that, I haven't noticed any issues w/ this patch.

Sincerely,
Alexey.

--
Command Prompt, Inc. http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#32Noah Misch
noah@leadboat.com
In reply to: Alexey Klyukin (#31)
Re: Identifying no-op length coercions

On Tue, Jun 21, 2011 at 06:31:44PM +0300, Alexey Klyukin wrote:

Here is my review of this patch.

Thanks!

The patch applies cleanly to the HEAD, produces no additional warnings. It
doesn't include additional regression tests. One can include a test, using the
commands like the ones included below.

Changes to the documentation are limited to the a description of a new
pg_class attribute. It would probably make sense to document all the
exceptions to the table's rewrite on ALTER TABLE documentation page, although
it could wait for more such exceptions.

I like the current level of detail in the ALTER TABLE page. Having EXPLAIN
ALTER TABLE would help here, but that's a bigger project than this one.

postgres=# alter table test alter column name type varchar(10);
ALTER TABLE
postgres=# select relfilenode from pg_class where oid='test'::regclass;
relfilenode
-------------
66308
(1 row)
postgres=# alter table test alter column name type varchar(65535);
ALTER TABLE
postgres=# select relfilenode from pg_class where oid='test'::regclass;
relfilenode
-------------
66308
(1 row)

A pg_regress test needs stable output, so we would do it roughly like this:

CREATE TEMP TABLE relstorage AS SELECT 0::regclass AS oldnode;
...
UPDATE relstorage SET oldnode =
(SELECT relfilenode FROM pg_class WHERE oid = 'test'::regclass);
ALTER TABLE test ALTER name TYPE varchar(65535);
SELECT oldnode <> relfilenode AS rewritten
FROM pg_class, relstorage WHERE oid = 'test'::regclass;

I originally rejected that as too ugly to read. Perhaps not.

The only nitpick code-wise is these lines in varchar_transform:

+ 		int32		old_max = exprTypmod(source) - VARHDRSZ;
+ 		int32		new_max = new_typmod - VARHDRSZ;

I have a hard time understanding why VARHDRSZ is subtracted here, so I'd assume that's a bug.

We track the varchar typmod internally as (max length) + VARHDRSZ.

#33Alexey Klyukin
alexk@commandprompt.com
In reply to: Noah Misch (#32)
Re: Identifying no-op length coercions

On Jun 21, 2011, at 9:58 PM, Noah Misch wrote:

A pg_regress test needs stable output, so we would do it roughly like this:

CREATE TEMP TABLE relstorage AS SELECT 0::regclass AS oldnode;
...
UPDATE relstorage SET oldnode =
(SELECT relfilenode FROM pg_class WHERE oid = 'test'::regclass);
ALTER TABLE test ALTER name TYPE varchar(65535);
SELECT oldnode <> relfilenode AS rewritten
FROM pg_class, relstorage WHERE oid = 'test'::regclass;

I originally rejected that as too ugly to read. Perhaps not.

Yes, your example is more appropriate. I think you can make it more
straightforward by getting rid of the temp table:

CREATE TABLE test(oldnode oid, name varchar(5));

INSERT INTO test(oldnode) SELECT relfilenode FROM pg_class WHERE
oid='test'::regclass;

ALTER TABLE test ALTER name TYPE varchar(10);

SELECT oldnode <> relfilenode AS rewritten FROM pg_class, test WHERE
oid='test'::regclass;

The only nitpick code-wise is these lines in varchar_transform:

+ 		int32		old_max = exprTypmod(source) - VARHDRSZ;
+ 		int32		new_max = new_typmod - VARHDRSZ;

I have a hard time understanding why VARHDRSZ is subtracted here, so I'd assume that's a bug.

We track the varchar typmod internally as (max length) + VARHDRSZ.

Oh, right, haven't thought that this is a varchar specific thing.

Thank you,
Alexey.

--
Command Prompt, Inc. http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#34Robert Haas
robertmhaas@gmail.com
In reply to: Alexey Klyukin (#33)
Re: Identifying no-op length coercions

On Tue, Jun 21, 2011 at 5:50 PM, Alexey Klyukin <alexk@commandprompt.com> wrote:

On Jun 21, 2011, at 9:58 PM, Noah Misch wrote:

A pg_regress test needs stable output, so we would do it roughly like this:

      CREATE TEMP TABLE relstorage AS SELECT 0::regclass AS oldnode;
      ...
      UPDATE relstorage SET oldnode =
              (SELECT relfilenode FROM pg_class WHERE oid = 'test'::regclass);
      ALTER TABLE test ALTER name TYPE varchar(65535);
      SELECT oldnode <> relfilenode AS rewritten
      FROM pg_class, relstorage WHERE oid = 'test'::regclass;

I originally rejected that as too ugly to read.  Perhaps not.

Yes, your example is more appropriate. I think you can make it more
straightforward by getting rid of the temp table:

CREATE TABLE test(oldnode oid, name varchar(5));

INSERT INTO test(oldnode) SELECT relfilenode FROM pg_class WHERE
oid='test'::regclass;

ALTER TABLE test ALTER name TYPE varchar(10);

SELECT oldnode <> relfilenode AS rewritten FROM pg_class, test WHERE
oid='test'::regclass;

Without wishing to foreclose the possibility of adding a suitable
regression test, I've committed the main patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company