ALTER TYPE 3: add facility to identify further no-work cases

Started by Noah Mischover 15 years ago42 messageshackers
Jump to latest
#1Noah Misch
noah@leadboat.com

Here I add the notion of an "exemptor function", a property of a cast that
determines when calls to the cast would be superfluous. Such calls can be
removed, reduced to RelabelType expressions, or annotated (via a new field in
FuncExpr) with the applicable exemptions. I modify various parse_coerce.c
functions to retrieve, call, and act on these exemptor functions; this includes
GetCoerceExemptions() from the last patch. I did opt to make
find_typmod_coercion_function return COERCION_PATH_RELABELTYPE when no work is
needed, rather than COERCION_PATH_NONE; this makes it consistent with
find_coercion_pathway's use of that enumeration.

To demonstrate the functionality, I add exemptor functions for varchar and xml.
Originally I was only going to start with varchar, but xml tests the other major
code path, and the exemptor function for xml is dead simple.

This helps on conversions like varchar(4)->varchar(8) and text->xml.

Attachments:

at3-exemptor.patchtext/plain; charset=us-asciiDownload+956-707
#2Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#1)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Sun, Jan 9, 2011 at 5:03 PM, Noah Misch <noah@leadboat.com> wrote:

Here I add the notion of an "exemptor function", a property of a cast that
determines when calls to the cast would be superfluous.  Such calls can be
removed, reduced to RelabelType expressions, or annotated (via a new field in
FuncExpr) with the applicable exemptions.  I modify various parse_coerce.c
functions to retrieve, call, and act on these exemptor functions; this includes
GetCoerceExemptions() from the last patch.  I did opt to make
find_typmod_coercion_function return COERCION_PATH_RELABELTYPE when no work is
needed, rather than COERCION_PATH_NONE; this makes it consistent with
find_coercion_pathway's use of that enumeration.

To demonstrate the functionality, I add exemptor functions for varchar and xml.
Originally I was only going to start with varchar, but xml tests the other major
code path, and the exemptor function for xml is dead simple.

This helps on conversions like varchar(4)->varchar(8) and text->xml.

I've read through this patch somewhat. As I believe Tom also
commented previously, exemptor is a pretty bad choice of name. More
generally, I think this patch is a case of coding something
complicated without first achieving consensus on what the behavior
should be. I think we need to address that problem first, and then
decide what code needs to be written, and then write the code.

It seems to me that patch two in this series has the right idea: skip
rewriting the table in cases where the types are binary coercible
(WITHOUT FUNCTION) or one is an unconstrained domain over the other.
IIUC, the problem this patch is trying to address is that our current
CAST infrastructure is insufficient to identify all cases where this
is true - in particular, it makes the decision solely based on the
type OIDs, without consideration of the typmods. In general, varchar
-> varchar is not binary coercible, but varchar(4) -> varchar(8) is
binary coercible. I think what this patch is trying to do is tag the
call to the cast function with the information that we might not need
to call it, but ISTM it would be better to just notice that the
function call is unnecessary and insert a RelabelType node instead.

*reads archives*

In fact, Tom already made pretty much exactly this suggestion, on a
thread entitled "Avoiding rewrite in ALTER TABLE ALTER TYPE". The
only possible objection I can see to that line of attack is that it's
not clear what the API should be, or whether there might be too much
runtime overhead for the amount of benefit that can be obtained.

This is, incidentally, another problem with this patch - if you want
to make wide-ranging changes to the system like this, you need to
provide a convincing argument that it's not going to compromise
correctness or performance. Just submitting the patch without making
an argument for why it's correct is no good, unless it's so obviously
correct as to be unquestionable, which is certainly not the case here.
You've got to write up some kind of submission notes so that the
reviewer isn't trying to guess why you think this is the right
approach, or spending a lot of time thinking through approaches you've
discarded for good reason. If there is a danger of performance
regression, you need to refute it, preferably with actual benchmarks.
A particular aspect I'm concerned about with this patch in particular:
it strikes me that the overhead of calling the exemptor functions in
the ALTER TABLE case is negligible, but that this might not be true in
other contexts where some of this logic may get invoked - it appears
to implicate the casting machinery much more generally.

I think it's a mistake to merge the handling of the rewrite-vs-noop
case with the check-vs-noop case. They're fundamentally dissimilar.
As noted above, being able to notice the noop case seems like it could
save run-time work during ordinary query execution. But detecting the
"check" case is useless work except in the specific case of a table
rewrite. If we're going to do that at all, it seems to me that it
needs to be done via a code-path that's only going to get invoked in
the case of ALTER TABLE, not something that's going to happen every
time we parse an expression tree. But it seems to me that it might be
better to take the suggestion I made before: forget about the
check-only case for now, and focus on the do-nothing case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#2)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Mon, Jan 24, 2011 at 07:18:47PM -0500, Robert Haas wrote:

On Sun, Jan 9, 2011 at 5:03 PM, Noah Misch <noah@leadboat.com> wrote:

Here I add the notion of an "exemptor function", a property of a cast that
determines when calls to the cast would be superfluous. ?Such calls can be
removed, reduced to RelabelType expressions, or annotated (via a new field in
FuncExpr) with the applicable exemptions. ?I modify various parse_coerce.c
functions to retrieve, call, and act on these exemptor functions; this includes
GetCoerceExemptions() from the last patch. ?I did opt to make
find_typmod_coercion_function return COERCION_PATH_RELABELTYPE when no work is
needed, rather than COERCION_PATH_NONE; this makes it consistent with
find_coercion_pathway's use of that enumeration.

To demonstrate the functionality, I add exemptor functions for varchar and xml.
Originally I was only going to start with varchar, but xml tests the other major
code path, and the exemptor function for xml is dead simple.

This helps on conversions like varchar(4)->varchar(8) and text->xml.

I've read through this patch somewhat. As I believe Tom also
commented previously, exemptor is a pretty bad choice of name.

I spent awhile with a thesaurus before coining that. Since Tom's comment, I've
been hoping the next person to complain would at least suggest a better name.
Shooting in the dark on a nomenclature issue isn't fun, and I'm not picky.

More
generally, I think this patch is a case of coding something
complicated without first achieving consensus on what the behavior
should be. I think we need to address that problem first, and then
decide what code needs to be written, and then write the code.

Well, that's why I started the design thread "Avoiding rewrite in ALTER TABLE
ALTER TYPE" before writing any code. That thread petered out rather than reach
any consensus. What would you have done then?

It seems to me that patch two in this series has the right idea: skip
rewriting the table in cases where the types are binary coercible
(WITHOUT FUNCTION) or one is an unconstrained domain over the other.
IIUC, the problem this patch is trying to address is that our current
CAST infrastructure is insufficient to identify all cases where this
is true - in particular, it makes the decision solely based on the
type OIDs, without consideration of the typmods. In general, varchar
-> varchar is not binary coercible, but varchar(4) -> varchar(8) is
binary coercible. I think what this patch is trying to do is tag the
call to the cast function with the information that we might not need
to call it, but ISTM it would be better to just notice that the
function call is unnecessary and insert a RelabelType node instead.

This patch does exactly that for varchar(4) -> varchar(8) and similar cases.

*reads archives*

In fact, Tom already made pretty much exactly this suggestion, on a
thread entitled "Avoiding rewrite in ALTER TABLE ALTER TYPE". The
only possible objection I can see to that line of attack is that it's
not clear what the API should be, or whether there might be too much
runtime overhead for the amount of benefit that can be obtained.

I believe the patch does implement Tom's suggestion, at least in spirit. He
mentioned expression_planner() as the place to do it. In principle, We could do
it *right there* and avoid any changes to coerce_to_target_type(), but the
overhead would increase. Specifically, we would be walking an already-built
expression tree looking for casts to remove, probably doing a syscache lookup
for every cast to grab the exemptor function OID. Doing it there would prevent
us from directly helping or harming plain old queries. Since ExecRelCheck(),
FormIndexDatum() and other notable paths call expression_planner(), we wouldn't
want to casually add an expensive algorithm there. To avoid the extra syscache
lookups, we'd piggyback off those already done in coerce_to_target_type(). That
brings us to the current implementation. Better to make the entire decision in
coerce_to_target_type() and never add the superfluous node to the expression
tree in the first place.

coerce_to_target_type() and callees seemed like the natural, low cost place to
make the changes. By doing it that way, did I miss some important detail or
implication of Tom's suggestion?

This is, incidentally, another problem with this patch - if you want
to make wide-ranging changes to the system like this, you need to
provide a convincing argument that it's not going to compromise
correctness or performance. Just submitting the patch without making
an argument for why it's correct is no good, unless it's so obviously
correct as to be unquestionable, which is certainly not the case here.
You've got to write up some kind of submission notes so that the
reviewer isn't trying to guess why you think this is the right
approach, or spending a lot of time thinking through approaches you've
discarded for good reason. If there is a danger of performance
regression, you need to refute it, preferably with actual benchmarks.
A particular aspect I'm concerned about with this patch in particular:
it strikes me that the overhead of calling the exemptor functions in
the ALTER TABLE case is negligible, but that this might not be true in
other contexts where some of this logic may get invoked - it appears
to implicate the casting machinery much more generally.

I assumed that readers of this patch email had read the series introduction
email, which referred them to the design discussion that you found. Surely that
design answered more than zero of "why this is correct". What questions in
particular did you find unanswered?

As for performance, coerce_to_target_type() is certainly in wide use, but it's
not exactly a hot path. I prepared and ran the attached bench-coerce-worst.sql
and bench-coerce-best.sql. Both are quite artificial. The first one basically
asks "Can we measure the new overhead at all?" The second one asks "Would any
ordinary query benefit from removing a superfluous cast from an expression
tree?" The worst case had an 4% _speedup_, and the best case had a 27% speedup,
suggesting answers of "no" and "yes (but not dramatically)". The "worst-case"
speedup doesn't make sense, so it's probably an artifact of some recent change
on master creating a larger performance dip in this pathological test case. I
could rebase the patch and rerun the benchmark, but I doubt an exact figure
would be much more meaningful. Unfortunately, I can't think of any more-natural
tests (help welcome) that would still illustrate any performance difference.

I think it's a mistake to merge the handling of the rewrite-vs-noop
case with the check-vs-noop case. They're fundamentally dissimilar.
As noted above, being able to notice the noop case seems like it could
save run-time work during ordinary query execution. But detecting the
"check" case is useless work except in the specific case of a table
rewrite. If we're going to do that at all, it seems to me that it
needs to be done via a code-path that's only going to get invoked in
the case of ALTER TABLE, not something that's going to happen every
time we parse an expression tree.

I disagree that they're fundamentally dissimilar. They're _fundamentally_
similar, in that they are both fences, standing at different distances, around
the necessity of a cast function. Your argument only mentions performance, and
it boils down to a suggestion that we proceed to optimize things now by
separating these questions. Perhaps we should, but that's a different question.
Based on my benchmarks, I'm not seeing a need to micro-optimize.

But it seems to me that it might be
better to take the suggestion I made before: forget about the
check-only case for now, and focus on the do-nothing case.

Let's revisit this when we're on the same page about the above points.

Thanks,
nm

Attachments:

bench-coerce-worst.sqltext/plain; charset=us-asciiDownload
bench-coerce-best.sqltext/plain; charset=us-asciiDownload
#4Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#3)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Mon, Jan 24, 2011 at 11:13 PM, Noah Misch <noah@leadboat.com> wrote:

This helps on conversions like varchar(4)->varchar(8) and text->xml.

I've read through this patch somewhat.  As I believe Tom also
commented previously, exemptor is a pretty bad choice of name.

I spent awhile with a thesaurus before coining that.  Since Tom's comment, I've
been hoping the next person to complain would at least suggest a better name.
Shooting in the dark on a nomenclature issue isn't fun, and I'm not picky.

I can't speak for Tom, but I guess my gripe about that name is that we
seem to view a cast as either WITH FUNCTION or WITHOUT FUNCTION. A
cast-without-function is trivial from an implementation point of view,
but it's still a cast, whereas the word "exempt" seems to imply that
you're skipping the cast altogether. A side issue is that I really
want to avoid adding a new parser keyword for this. It doesn't bother
me too much to add keywords for really important and user-visible
features, but when we're adding stuff that's only going to be used by
0.1% of our users it's really better if we can avoid it, because it
slows down the parser. Maybe we could do something like:

CREATE CAST (source_type AS target_type)
WITH FUNCTION function_name (argument_type, [, ...])
[ ANALYZE USING function_name ]
[ AS ASSIGNMENT | AS IMPLICIT ]

Presumably, we wouldn't need the ANALYZE USING clause for the WITHOUT
FUNCTION case, since the answer is a foregone conclusion in that case.
We could possibly support it also for WITH INOUT, but I'm not sure
whether the real-world utility is more than zero.

Internally, we could refer to a function of this type as a "cast analyzer".

Don't take the above as Gospel truth, it's just an idea from one
person on one day.

More
generally, I think this patch is a case of coding something
complicated without first achieving consensus on what the behavior
should be.  I think we need to address that problem first, and then
decide what code needs to be written, and then write the code.

Well, that's why I started the design thread "Avoiding rewrite in ALTER TABLE
ALTER TYPE" before writing any code.  That thread petered out rather than reach
any consensus.  What would you have done then?

Let's defer this question until we're more on the same page about the
rest of this. It's obvious I'm not completely understanding this
patch...

 I think what this patch is trying to do is tag the
call to the cast function with the information that we might not need
to call it, but ISTM it would be better to just notice that the
function call is unnecessary and insert a RelabelType node instead.

This patch does exactly that for varchar(4) -> varchar(8) and similar cases.

Oh, uh, err... OK, I obviously haven't understood it well enough.
I'll look at it some more, but if there's anything you can write up to
explain what you changed and why, that would be helpful. I think I'm
also having trouble with the fact that these patches don't apply
cleanly against the master branch, because they're stacked up on each
other. Maybe this will be easier once we get more of the ALTER TYPE 2
stuff in.

*reads archives*

In fact, Tom already made pretty much exactly this suggestion, on a
thread entitled "Avoiding rewrite in ALTER TABLE ALTER TYPE".  The
only possible objection I can see to that line of attack is that it's
not clear what the API should be, or whether there might be too much
runtime overhead for the amount of benefit that can be obtained.

I believe the patch does implement Tom's suggestion, at least in spirit.  He
mentioned expression_planner() as the place to do it.  In principle, We could do
it *right there* and avoid any changes to coerce_to_target_type(), but the
overhead would increase.  Specifically, we would be walking an already-built
expression tree looking for casts to remove, probably doing a syscache lookup
for every cast to grab the exemptor function OID.  Doing it there would prevent
us from directly helping or harming plain old queries.  Since ExecRelCheck(),
FormIndexDatum() and other notable paths call expression_planner(), we wouldn't
want to casually add an expensive algorithm there.  To avoid the extra syscache
lookups, we'd piggyback off those already done in coerce_to_target_type().  That
brings us to the current implementation.  Better to make the entire decision in
coerce_to_target_type() and never add the superfluous node to the expression
tree in the first place.

coerce_to_target_type() and callees seemed like the natural, low cost place to
make the changes.  By doing it that way, did I miss some important detail or
implication of Tom's suggestion?

I'm not sure. Tom?

As for performance, coerce_to_target_type() is certainly in wide use, but it's
not exactly a hot path.  I prepared and ran the attached bench-coerce-worst.sql
and bench-coerce-best.sql.  Both are quite artificial.  The first one basically
asks "Can we measure the new overhead at all?"  The second one asks "Would any
ordinary query benefit from removing a superfluous cast from an expression
tree?"  The worst case had an 4% _speedup_, and the best case had a 27% speedup,
suggesting answers of "no" and "yes (but not dramatically)".  The "worst-case"
speedup doesn't make sense, so it's probably an artifact of some recent change
on master creating a larger performance dip in this pathological test case.  I
could rebase the patch and rerun the benchmark, but I doubt an exact figure
would be much more meaningful.  Unfortunately, I can't think of any more-natural
tests (help welcome) that would still illustrate any performance difference.

That certainly seems promising, but I don't understand how the new
code can be faster on your constructed worst case. Thoughts?

I think it's a mistake to merge the handling of the rewrite-vs-noop
case with the check-vs-noop case.  They're fundamentally dissimilar.
As noted above, being able to notice the noop case seems like it could
save run-time work during ordinary query execution.  But detecting the
"check" case is useless work except in the specific case of a table
rewrite.  If we're going to do that at all, it seems to me that it
needs to be done via a code-path that's only going to get invoked in
the case of ALTER TABLE, not something that's going to happen every
time we parse an expression tree.

I disagree that they're fundamentally dissimilar.  They're _fundamentally_
similar, in that they are both fences, standing at different distances, around
the necessity of a cast function.  Your argument only mentions performance, and
it boils down to a suggestion that we proceed to optimize things now by
separating these questions.  Perhaps we should, but that's a different question.
Based on my benchmarks, I'm not seeing a need to micro-optimize.

I'm more concerned about modularity than performance. Sticking a
field into every FuncExpr so that if it happens to get passed to an
ALTER TABLE statement we can pull the flag out seems really ugly to
me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: ALTER TYPE 3: add facility to identify further no-work cases

Robert Haas <robertmhaas@gmail.com> writes:

... A side issue is that I really
want to avoid adding a new parser keyword for this. It doesn't bother
me too much to add keywords for really important and user-visible
features, but when we're adding stuff that's only going to be used by
0.1% of our users it's really better if we can avoid it, because it
slows down the parser. Maybe we could do something like:

CREATE CAST (source_type AS target_type)
WITH FUNCTION function_name (argument_type, [, ...])
[ ANALYZE USING function_name ]
[ AS ASSIGNMENT | AS IMPLICIT ]

I'm not terribly thrilled with the suggestion of "ANALYZE" here, because
given the way we use that word elsewhere, people are likely to think it
means something to do with statistics collection; or at least that it
implies some deep and complicated analysis of the cast.

I suggest using a phrase based on the idea that this function tells you
whether you can skip the cast, or (if the sense is inverted) whether the
cast has to be executed. "SKIP IF function_name" would be nice but SKIP
isn't an extant keyword either. The best variant that I can come up
with after a minute's contemplation of kwlist.h is "NO WORK IF
function_name". If you didn't mind inverting the sense of the result
then we could use "EXECUTE IF function_name".

Internally, we could refer to a function of this type as a "cast analyzer".

Another possibility is to call it a "cast checker" and use CHECK USING.
But I'm not sure that's much better than ANALYZE in terms of conveying
the purpose to a newbie.

I believe the patch does implement Tom's suggestion, at least in spirit. �He
mentioned expression_planner() as the place to do it. �In principle, We could do
it *right there* and avoid any changes to coerce_to_target_type(), but the
overhead would increase. �Specifically, we would be walking an already-built
expression tree looking for casts to remove, probably doing a syscache lookup
for every cast to grab the exemptor function OID.

No no no no. The right place to do it is during expression
simplification in eval_const_expressions(). We are already
disassembling and reassembling the tree, and looking up functions,
in that pass. Detecting that a call is a no-op fits into that
very naturally. Think of it as an alternative to inline_function().

One point worth making here is that eval_const_expressions() does not
currently care very much whether a function call came from cast syntax
or explicitly. It might be worth thinking about whether we want to have
a generic this-function-call-is-a-no-op simplifier hook available for
*all* functions not just those that are casts. I'm not sure we want to
pay the overhead of another pg_proc column, but it's something to think
about.

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Wed, Jan 26, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

... A side issue is that I really
want to avoid adding a new parser keyword for this.  It doesn't bother
me too much to add keywords for really important and user-visible
features, but when we're adding stuff that's only going to be used by
0.1% of our users it's really better if we can avoid it, because it
slows down the parser.  Maybe we could do something like:

CREATE CAST (source_type AS target_type)
    WITH FUNCTION function_name (argument_type, [, ...])
    [ ANALYZE USING function_name ]
    [ AS ASSIGNMENT | AS IMPLICIT ]

I'm not terribly thrilled with the suggestion of "ANALYZE" here, because
given the way we use that word elsewhere, people are likely to think it
means something to do with statistics collection; or at least that it
implies some deep and complicated analysis of the cast.

I suggest using a phrase based on the idea that this function tells you
whether you can skip the cast, or (if the sense is inverted) whether the
cast has to be executed.  "SKIP IF function_name" would be nice but SKIP
isn't an extant keyword either.  The best variant that I can come up
with after a minute's contemplation of kwlist.h is "NO WORK IF
function_name".  If you didn't mind inverting the sense of the result
then we could use "EXECUTE IF function_name".

What about borrowing from the trigger syntax?

WITH FUNCTION function_name (argument_type, [...]) WHEN
function_that_returns_true_when_the_call_is_needed

One point worth making here is that eval_const_expressions() does not
currently care very much whether a function call came from cast syntax
or explicitly.  It might be worth thinking about whether we want to have
a generic this-function-call-is-a-no-op simplifier hook available for
*all* functions not just those that are casts.  I'm not sure we want to
pay the overhead of another pg_proc column, but it's something to think
about.

It's not obvious to me that it has a use case outside of casts, but
it's certainly possible I'm missing something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: ALTER TYPE 3: add facility to identify further no-work cases

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jan 26, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you didn't mind inverting the sense of the result
then we could use "EXECUTE IF function_name".

What about borrowing from the trigger syntax?

WITH FUNCTION function_name (argument_type, [...]) WHEN
function_that_returns_true_when_the_call_is_needed

That's a good thought. Or we could use WHEN NOT check_function if you
want to keep to the negative-result semantics.

One point worth making here is that eval_const_expressions() does not
currently care very much whether a function call came from cast syntax
or explicitly. �It might be worth thinking about whether we want to have
a generic this-function-call-is-a-no-op simplifier hook available for
*all* functions not just those that are casts. �I'm not sure we want to
pay the overhead of another pg_proc column, but it's something to think
about.

It's not obvious to me that it has a use case outside of casts, but
it's certainly possible I'm missing something.

A possible example is simplifying X + 0 to X, or X * 0 to 0. I've never
wanted to inject such datatype-specific knowledge directly into the
planner, but if we viewed it as function-specific knowledge supplied by
a per-function helper routine, maybe it would fly. Knowing that a cast
function does nothing useful for particular cases could be seen as a
special case of this type of simplification.

regards, tom lane

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#7)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Wed, Jan 26, 2011 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jan 26, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you didn't mind inverting the sense of the result
then we could use "EXECUTE IF function_name".

What about borrowing from the trigger syntax?

WITH FUNCTION function_name (argument_type, [...]) WHEN
function_that_returns_true_when_the_call_is_needed

That's a good thought.  Or we could use WHEN NOT check_function if you
want to keep to the negative-result semantics.

Seems a bit contorted; I don't see any particular reason to prefer
positive vs negative semantics in this case.

One point worth making here is that eval_const_expressions() does not
currently care very much whether a function call came from cast syntax
or explicitly.  It might be worth thinking about whether we want to have
a generic this-function-call-is-a-no-op simplifier hook available for
*all* functions not just those that are casts.  I'm not sure we want to
pay the overhead of another pg_proc column, but it's something to think
about.

It's not obvious to me that it has a use case outside of casts, but
it's certainly possible I'm missing something.

A possible example is simplifying X + 0 to X, or X * 0 to 0.  I've never
wanted to inject such datatype-specific knowledge directly into the
planner, but if we viewed it as function-specific knowledge supplied by
a per-function helper routine, maybe it would fly.  Knowing that a cast
function does nothing useful for particular cases could be seen as a
special case of this type of simplification.

Oh, I see. The times I've seen an issue with those kinds of
expressions have always been related to selectivity estimation. For
example, you'd like to get a selectivity estimate of 1-nullfrac for
(x+0)=x and 0 for (x+1)=x, and maybe (1-nullfrac)/2 for (x%2)=0. This
would only handle the first of those cases, so I'm inclined to think
it's too weak to have much general utility. The casting cases can, I
think, have a much larger impact - they occur more often in practice,
and if you can (e.g.) avoid an entire table rewrite, that's a pretty
big deal.

Another semi-related problem case I've run across is that CASE WHEN
... THEN 1 WHEN ... THEN 2 END ought to be known to only ever emit 1
and 2, and the selectivity of comparing that to some other value ought
to be computed on that basis. But now I'm really wandering off into
the weeds.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#8)
Re: ALTER TYPE 3: add facility to identify further no-work cases

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jan 26, 2011 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

It's not obvious to me that it has a use case outside of casts, but
it's certainly possible I'm missing something.

A possible example is simplifying X + 0 to X, or X * 0 to 0.

Oh, I see. The times I've seen an issue with those kinds of
expressions have always been related to selectivity estimation.

Yeah, helping the planner recognize equivalent cases is at least as
large a reason for wanting this as any direct savings of execution time.

I don't mind confining the feature to casts to start with, but it might
be a good idea to specify the check-function API in a way that would let
it be plugged into a more generally available call-simplification hook
later.

regards, tom lane

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#9)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Wed, Jan 26, 2011 at 4:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jan 26, 2011 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

It's not obvious to me that it has a use case outside of casts, but
it's certainly possible I'm missing something.

A possible example is simplifying X + 0 to X, or X * 0 to 0.

Oh, I see.  The times I've seen an issue with those kinds of
expressions have always been related to selectivity estimation.

Yeah, helping the planner recognize equivalent cases is at least as
large a reason for wanting this as any direct savings of execution time.

Agreed.

I don't mind confining the feature to casts to start with, but it might
be a good idea to specify the check-function API in a way that would let
it be plugged into a more generally available call-simplification hook
later.

Got any suggestions? My thought was that it should just get (type,
typemod, type, typemod) and return a boolean. We could do something
more complicated, like Expr -> Expr, but I'm pretty unconvinced
there's any value there. I'd like to see some kind of appropriate
hook for interjecting selectivity estimates for cases that we
currently can't recognize, but my gut feeling is that that's
insufficiently related at the problem at hand to worry about it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: ALTER TYPE 3: add facility to identify further no-work cases

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jan 26, 2011 at 4:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't mind confining the feature to casts to start with, but it might
be a good idea to specify the check-function API in a way that would let
it be plugged into a more generally available call-simplification hook
later.

Got any suggestions? My thought was that it should just get (type,
typemod, type, typemod) and return a boolean. We could do something
more complicated, like Expr -> Expr, but I'm pretty unconvinced
there's any value there.

Well, (type, typemod, type, typemod) would be fine as long as the only
case you're interested in optimizing is typemod-lengthening coercions.
I think we're going to want the more general Expr-to-Expr capability
eventually.

I guess we could do the restricted case for now, with the idea that
there could be a standard Expr-to-Expr interface function created later
and installed into the generic hook facility for functions that are cast
functions. That could look into pg_cast and make the more restricted
call when appropriate. (The meat of this function would be the same
code you'd have to add to eval_const_expressions anyway for the
hard-wired version...)

I'd like to see some kind of appropriate
hook for interjecting selectivity estimates for cases that we
currently can't recognize, but my gut feeling is that that's
insufficiently related at the problem at hand to worry about it.

Agreed, that seems a bit off-topic. There are ancient comments in the
source code complaining about the lack of selectivity hooks for function
(as opposed to operator) calls, but that's not what Noah signed up to
fix. In any case I'm not sure that expression simplification is
closely connected to selectivity estimation --- to my mind it's an
independent process that is good to run first. As an example, the ALTER
TABLE case has a use for the former but not the latter.

regards, tom lane

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Wed, Jan 26, 2011 at 5:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jan 26, 2011 at 4:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't mind confining the feature to casts to start with, but it might
be a good idea to specify the check-function API in a way that would let
it be plugged into a more generally available call-simplification hook
later.

Got any suggestions?  My thought was that it should just get (type,
typemod, type, typemod) and return a boolean.  We could do something
more complicated, like Expr -> Expr, but I'm pretty unconvinced
there's any value there.

Well, (type, typemod, type, typemod) would be fine as long as the only
case you're interested in optimizing is typemod-lengthening coercions.
I think we're going to want the more general Expr-to-Expr capability
eventually.

I guess we could do the restricted case for now, with the idea that
there could be a standard Expr-to-Expr interface function created later
and installed into the generic hook facility for functions that are cast
functions.  That could look into pg_cast and make the more restricted
call when appropriate.  (The meat of this function would be the same
code you'd have to add to eval_const_expressions anyway for the
hard-wired version...)

Well, if you're positive we're eventually going to want this in
pg_proc, we may as well add it now. But I'm not too convinced it's
the right general API. The number of people writing exactly x + 0 or
x * 0 in a query has got to be vanishingly small; I'm not eager to add
additional parse analysis time to every SQL statement that has a
function in it just to detect those cases. Even slightly more
complicated problems seem intractable - e.g. (x + 1) = x can be
simplified to constant false, and NOT ((x + 1) = x) can be simplified
to x IS NOT NULL, but under the proposed API those would have to hang
off of =(int4,int4), which seems pretty darn ugly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: ALTER TYPE 3: add facility to identify further no-work cases

Robert Haas <robertmhaas@gmail.com> writes:

Well, if you're positive we're eventually going to want this in
pg_proc, we may as well add it now. But I'm not too convinced it's
the right general API. The number of people writing exactly x + 0 or
x * 0 in a query has got to be vanishingly small; I'm not eager to add
additional parse analysis time to every SQL statement that has a
function in it just to detect those cases.

Actually, you've got that backwards: the facility I've got in mind would
cost next to nothing when not used. The place where we'd want to insert
this in eval_const_expressions has already got its hands on the relevant
pg_proc row, so checking for a nonzero hook-function reference would be
a matter of a couple of instructions. If we go with a pg_cast entry
then we're going to have to add a pg_cast lookup for every cast, whether
it turns out to be optimizable or not; which is going to cost quite a
lot more. The intermediate hook function I was sketching might be
worthwhile from a performance standpoint even if we don't expose the
more general feature to users, just because it would be possible to
avoid useless pg_cast lookups (by not installing the hook except on
pg_proc entries for which there's a relevant CAST WHEN function to call).

Even slightly more
complicated problems seem intractable - e.g. (x + 1) = x can be
simplified to constant false, and NOT ((x + 1) = x) can be simplified
to x IS NOT NULL, but under the proposed API those would have to hang
off of =(int4,int4), which seems pretty darn ugly.

True, but where else are you going to hang them off of? Not that I was
particularly thinking of doing either one of those.

regards, tom lane

#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Wed, Jan 26, 2011 at 5:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Well, if you're positive we're eventually going to want this in
pg_proc, we may as well add it now.  But I'm not too convinced it's
the right general API.  The number of people writing exactly x + 0 or
x * 0 in a query has got to be vanishingly small; I'm not eager to add
additional parse analysis time to every SQL statement that has a
function in it just to detect those cases.

Actually, you've got that backwards: the facility I've got in mind would
cost next to nothing when not used.  The place where we'd want to insert
this in eval_const_expressions has already got its hands on the relevant
pg_proc row, so checking for a nonzero hook-function reference would be
a matter of a couple of instructions.  If we go with a pg_cast entry
then we're going to have to add a pg_cast lookup for every cast, whether
it turns out to be optimizable or not; which is going to cost quite a
lot more.  The intermediate hook function I was sketching might be
worthwhile from a performance standpoint even if we don't expose the
more general feature to users, just because it would be possible to
avoid useless pg_cast lookups (by not installing the hook except on
pg_proc entries for which there's a relevant CAST WHEN function to call).

Oh, really? I was thinking the logic should go into find_coercion_pathway().

Even slightly more
complicated problems seem intractable - e.g. (x + 1) = x can be
simplified to constant false, and NOT ((x + 1) = x) can be simplified
to x IS NOT NULL, but under the proposed API those would have to hang
off of =(int4,int4), which seems pretty darn ugly.

True, but where else are you going to hang them off of?  Not that I was
particularly thinking of doing either one of those.

Beats me, just thinking out loud.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
Re: ALTER TYPE 3: add facility to identify further no-work cases

Robert Haas <robertmhaas@gmail.com> writes:

Oh, really? I was thinking the logic should go into find_coercion_pathway().

Well, I've been saying right along that it should be in
eval_const_expressions. Putting this sort of optimization in the parser
is invariably the wrong thing, because it fails to catch all the
possibilities. As an example, inlining a SQL function could expose
opportunities of this type. Another issue is that premature
optimization in the parser creates headaches if conditions change such
that a previous optimization is no longer valid --- you may have stored
rules wherein the optimization was already applied. (Not sure that
specific issue applies to casting, since we have no ALTER CAST commmand;
but in general you want expression optimizations applied downstream from
the rule rewriter not upstream.)

regards, tom lane

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#15)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Wed, Jan 26, 2011 at 5:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Oh, really?  I was thinking the logic should go into find_coercion_pathway().

Well, I've been saying right along that it should be in
eval_const_expressions.  Putting this sort of optimization in the parser
is invariably the wrong thing, because it fails to catch all the
possibilities.  As an example, inlining a SQL function could expose
opportunities of this type.  Another issue is that premature
optimization in the parser creates headaches if conditions change such
that a previous optimization is no longer valid --- you may have stored
rules wherein the optimization was already applied.  (Not sure that
specific issue applies to casting, since we have no ALTER CAST commmand;
but in general you want expression optimizations applied downstream from
the rule rewriter not upstream.)

Well, I guess my thought was that we what we were doing is extending
the coercion system to be able to make decisions based on both type
OID and typemod. It seems very odd to make an initial decision based
on type OID in one place and then have a completely different system
for incorporating the typemod; and it also seems quite a bit more
expensive, since you'd have to consider it for every function, not
just casts.

A further problem is that I don't think it'd play well at all with the
richer-typemod concept we keep bandying about. If, for example, we
represented all arrays with the same OID (getting rid of the
double-entries in pg_type) and used some kind of augmented-typemod to
store the number of dimensions and the base type, this would
completely fall over.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#15)
Re: ALTER TYPE 3: add facility to identify further no-work cases

I wrote:

... Another issue is that premature
optimization in the parser creates headaches if conditions change such
that a previous optimization is no longer valid --- you may have stored
rules wherein the optimization was already applied. (Not sure that
specific issue applies to casting, since we have no ALTER CAST commmand;
but in general you want expression optimizations applied downstream from
the rule rewriter not upstream.)

Actually, I can construct a concrete example where applying this
optimization in the parser will do the wrong thing:

CREATE TABLE base (f1 varchar(4));
CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);

If the parser is taught to throw away "useless" length coercions, then
the stored form of vv will contain no cast, and the results will be
wrong after base's column is widened.

regards, tom lane

#18Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#17)
Re: ALTER TYPE 3: add facility to identify further no-work cases

On Wed, Jan 26, 2011 at 05:55:37PM -0500, Tom Lane wrote:

I wrote:

... Another issue is that premature
optimization in the parser creates headaches if conditions change such
that a previous optimization is no longer valid --- you may have stored
rules wherein the optimization was already applied. (Not sure that
specific issue applies to casting, since we have no ALTER CAST commmand;
but in general you want expression optimizations applied downstream from
the rule rewriter not upstream.)

Actually, I can construct a concrete example where applying this
optimization in the parser will do the wrong thing:

CREATE TABLE base (f1 varchar(4));
CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);

If the parser is taught to throw away "useless" length coercions, then
the stored form of vv will contain no cast, and the results will be
wrong after base's column is widened.

We reject that:

[local] test=# ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);
ERROR: cannot alter type of a column used by a view or rule
DETAIL: rule _RETURN on view vv depends on column "f1"

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#16)
Re: ALTER TYPE 3: add facility to identify further no-work cases

Robert Haas <robertmhaas@gmail.com> writes:

Well, I guess my thought was that we what we were doing is extending
the coercion system to be able to make decisions based on both type
OID and typemod.

Well, that's an interesting thought, but the proposal at hand is far
more limited than that --- it's only an optimization that can be applied
in limited situations. If we want to do something like what you're
suggesting here, it's not going to get done for 9.1.

A further problem is that I don't think it'd play well at all with the
richer-typemod concept we keep bandying about. If, for example, we
represented all arrays with the same OID (getting rid of the
double-entries in pg_type) and used some kind of augmented-typemod to
store the number of dimensions and the base type, this would
completely fall over.

Your proposal would completely fall over, you mean. An Expr->Expr hook
API wouldn't be affected at all.

I'm not sure how important that concern is though, because it's hard to
see how any such change wouldn't break existing cast implementation
functions anyway. If the API for length-coercing cast functions
changes, breaking their helper functions too hardly seems like an issue.
Or are you saying you want to punt this whole proposal till after that
happens? I had muttered something of the sort way upthread, but I
didn't think anyone else thought that way.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#18)
Re: ALTER TYPE 3: add facility to identify further no-work cases

Noah Misch <noah@leadboat.com> writes:

On Wed, Jan 26, 2011 at 05:55:37PM -0500, Tom Lane wrote:

Actually, I can construct a concrete example where applying this
optimization in the parser will do the wrong thing:

CREATE TABLE base (f1 varchar(4));
CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);

If the parser is taught to throw away "useless" length coercions, then
the stored form of vv will contain no cast, and the results will be
wrong after base's column is widened.

We reject that:

[local] test=# ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);
ERROR: cannot alter type of a column used by a view or rule
DETAIL: rule _RETURN on view vv depends on column "f1"

Nonetheless, the stored form of vv will contain no cast, which can
hardly be thought safe here. Relaxing the restriction you cite is (or
should be) on the TODO list, but we'll never be able to do it if the
parser throws away relevant information.

regards, tom lane

#21Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#13)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#21)
#23Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#23)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#24)
#28Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#24)
#29Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#25)
#30Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#4)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#32)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#35)
#37Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#31)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
#39Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#38)
#40Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#39)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#39)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#40)