What happened to the is_<type> family of functions proposal?

Started by Colin 't Hartover 15 years ago26 messageshackers

colinthart@gmail.com

over 15 years ago

Hi,

Back in 2002 these were proposed, what happened to them?

http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php

Also I note:

colin@ruby:~/workspace/eyedb$ psql
psql (8.4.4)
Type "help" for help.

colin=> select to_date('731332', 'YYMMDD');
to_date
------------
1974-02-01
(1 row)

colin=>

The fact that this wraps would seem to me to make the implementation of
is_date() difficult.

I'm trying to query character strings for valid dates but can't see how to
do this quickly... but for that discussion I will move to pgsql-general :-)

Cheers,

Colin

andrew@dunslane.net

over 15 years ago

In reply to: Colin 't Hart (#1)

Re: What happened to the is_<type> family of functions proposal?

On 09/20/2010 10:29 AM, Colin 't Hart wrote:

Hi,

Back in 2002 these were proposed, what happened to them?

http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php

2002 is a looooooooong time ago.

Also I note:

colin@ruby:~/workspace/eyedb$ psql
psql (8.4.4)
Type "help" for help.

colin=> select to_date('731332', 'YYMMDD');
to_date
------------
1974-02-01
(1 row)

colin=>

The fact that this wraps would seem to me to make the implementation
of is_date() difficult.

I think to_date is the wrong gadget to use here. You should probably be
using the date input routine and trapping any data exception. e.g.:

test_date := date_in(textout(some_text));

In plpgsql you'd put that inside a begin/exception/end block that traps
SQLSTATE '22000' which is the class covering data exceptions.

cheers

andrew

colinthart@gmail.com

over 15 years ago

In reply to: Andrew Dunstan (#2)

Re: What happened to the is_<type> family of functions proposal?

On 20 September 2010 16:54, Andrew Dunstan <andrew@dunslane.net> wrote:

On 09/20/2010 10:29 AM, Colin 't Hart wrote:

Hi,

Back in 2002 these were proposed, what happened to them?

http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php

2002 is a looooooooong time ago.

<snip>

I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any data exception. e.g.:

test_date := date_in(textout(some_text));

In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.

So it's not possible using pure SQL unless one writes a function?

Are the is_<type> family of functions still desired?

Also, where are the to_<type> conversions done?

Thanks,

Colin

robertmhaas@gmail.com

over 15 years ago

In reply to: Colin 't Hart (#3)

Re: What happened to the is_<type> family of functions proposal?

On Mon, Sep 20, 2010 at 11:31 AM, Colin 't Hart <colinthart@gmail.com> wrote:

I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any data exception. e.g.:

test_date := date_in(textout(some_text));

In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.

So it's not possible using pure SQL unless one writes a function?

I think that is true.

Are the is_<type> family of functions still desired?

I think it would be useful to have a way of testing whether a cast to
a given type will succeed. The biggest problem with the
exception-catching method is not that it requires writing a function
(which, IMHO, is no big deal) but that exception handling is pretty
slow and inefficient. You end up doing things like... write a regexp
to see whether the data is in approximately the right format and then
if it is try the cast inside an exception block. Yuck.

(On the other hand, whether the work that was done in 2002 is still
relevant to today's code is questionable. Things have changed a lot.)

Also, where are the to_<type> conversions done?

I think maybe you are looking for the type input functions?

select typname, typinput::regprocedure from pg_type;

There are also some functions with names of the form to_<type>. You
can get a list of those with the following psql command:

\dfS to_*

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#4)

Re: What happened to the is_<type> family of functions proposal?

Robert Haas <robertmhaas@gmail.com> writes:

I think it would be useful to have a way of testing whether a cast to
a given type will succeed. The biggest problem with the
exception-catching method is not that it requires writing a function
(which, IMHO, is no big deal) but that exception handling is pretty
slow and inefficient. You end up doing things like... write a regexp
to see whether the data is in approximately the right format and then
if it is try the cast inside an exception block. Yuck.

The problem here is that putting the exception handling in C doesn't
make things any better: it's still slow and inefficient. And in the
general case the only way to be sure that a string will be accepted by
the input function is to try it.

regards, tom lane

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#5)

Re: What happened to the is_<type> family of functions proposal?

On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I think it would be useful to have a way of testing whether a cast to
a given type will succeed. The biggest problem with the
exception-catching method is not that it requires writing a function
(which, IMHO, is no big deal) but that exception handling is pretty
slow and inefficient. You end up doing things like... write a regexp
to see whether the data is in approximately the right format and then
if it is try the cast inside an exception block. Yuck.

The problem here is that putting the exception handling in C doesn't
make things any better: it's still slow and inefficient. And in the
general case the only way to be sure that a string will be accepted by
the input function is to try it.

Given the current API, that is true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

alvherre@2ndquadrant.com

over 15 years ago

In reply to: Robert Haas (#6)

Re: What happened to the is_<type> family of functions proposal?

Excerpts from Robert Haas's message of mar sep 21 11:56:51 -0400 2010:

On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The problem here is that putting the exception handling in C doesn't
make things any better: it's still slow and inefficient. And in the
general case the only way to be sure that a string will be accepted by
the input function is to try it.

Given the current API, that is true.

So we could refactor the input functions so that there's an internal
function that returns the accepted datum in the OK case and an ErrorData
for the failure case. The regular input function would just throw the
error data in the latter case; but this would allow another function to
just return whether it worked or not.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Alvaro Herrera (#7)

Re: What happened to the is_<type> family of functions proposal?

Alvaro Herrera <alvherre@commandprompt.com> writes:

On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The problem here is that putting the exception handling in C doesn't
make things any better:

So we could refactor the input functions so that there's an internal
function that returns the accepted datum in the OK case and an ErrorData
for the failure case.

This makes the untenable assumption that there are no elog(ERROR)s in
the "internal" input function *or anything it calls*. Short of truly
massive restructuring, including uglifying many internal APIs to have
error return codes instead of allowing elog within the callee, you will
never make this work for anything more complicated than say float8in().

regards, tom lane

bruce@momjian.us

over 15 years ago

In reply to: Alvaro Herrera (#7)

Re: What happened to the is_<type> family of functions proposal?

On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

So we could refactor the input functions so that there's an internal
function that returns the accepted datum in the OK case and an ErrorData
for the failure case. The regular input function would just throw the
error data in the latter case; but this would allow another function to
just return whether it worked or not.

You're assuming the input function won't have any work it has to undo
which it would need the savepoint for anyways. For most of the
built-in datatypes -- all of the ones intended for holding real data
-- that's true. But for things like regclass or regtype it might not
be and for user-defined data types.... who knows?

Of course all people really want is to test whether something is a
valid integer, floating point value, etc.

--
greg

robertmhaas@gmail.com

over 15 years ago

In reply to: Bruce Momjian (#9)

Re: What happened to the is_<type> family of functions proposal?

On Tue, Sep 21, 2010 at 1:45 PM, Greg Stark <gsstark@mit.edu> wrote:

On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

So we could refactor the input functions so that there's an internal
function that returns the accepted datum in the OK case and an ErrorData
for the failure case. The regular input function would just throw the
error data in the latter case; but this would allow another function to
just return whether it worked or not.

You're assuming the input function won't have any work it has to undo
which it would need the savepoint for anyways. For most of the
built-in datatypes -- all of the ones intended for holding real data
-- that's true. But for things like regclass or regtype it might not
be and for user-defined data types.... who knows?

Of course all people really want is to test whether something is a
valid integer, floating point value, etc.

Right. Or a date - that's a case that comes up for me pretty
frequently. It's not too hard to write a regular expression to test
whether something is an integer -- although there is the question of
whether it will overflow, which is sometimes relevant -- but a date or
timestamp field is a bit harder.

I don't understand the argument that we need type input functions to
be protected by a savepoint. That seems crazy to me. We're taking a
huge performance penalty here to protect against something that seems
insane to me in the first instance. Not to mention cutting ourselves
off from really important features, like the ability to recover from
errors during COPY. I don't understand why we can't just make some
rules about what type input functions are allowed to do. And if you
break those rules then you get to keep both pieces. Why is this
unreasonable? A savepoint can hardly protect you against damage
inflicted by the execution of arbitrary code; IOW, we're already
relying on the user to follow some rules.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

alvherre@2ndquadrant.com

over 15 years ago

In reply to: Tom Lane (#8)

Re: What happened to the is_<type> family of functions proposal?

Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010:

Alvaro Herrera <alvherre@commandprompt.com> writes:

On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The problem here is that putting the exception handling in C doesn't
make things any better:

So we could refactor the input functions so that there's an internal
function that returns the accepted datum in the OK case and an ErrorData
for the failure case.

This makes the untenable assumption that there are no elog(ERROR)s in
the "internal" input function *or anything it calls*. Short of truly
massive restructuring, including uglifying many internal APIs to have
error return codes instead of allowing elog within the callee, you will
never make this work for anything more complicated than say float8in().

... which is what people want anyway. I mean, the day someone requests
is_sthcomplex, we could happily tell them that they need to use the
expensive workaround involving savepoints. I don't think we really need
to support the ones that would require truly expensive refactoring; the
simple ones would cover 99% of the use cases.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#10)

Re: What happened to the is_<type> family of functions proposal?

Robert Haas <robertmhaas@gmail.com> writes:

I don't understand the argument that we need type input functions to
be protected by a savepoint. That seems crazy to me. We're taking a
huge performance penalty here to protect against something that seems
insane to me in the first instance. Not to mention cutting ourselves
off from really important features, like the ability to recover from
errors during COPY. I don't understand why we can't just make some
rules about what type input functions are allowed to do.

There are many rules that you could possibly make for type input
functions. But "you cannot throw an error" is not one of them ---
or at least, not one that you can usefully expect to be followed
for anything more than trivial straightline code.

The poster child for this is of course domain_in(). But even without
that, I don't think you can realistically legislate that no errors be
thrown by something of the complexity of, say, the timestamp input
functions. Just for starters, what of a palloc() failure?

regards, tom lane

andres@anarazel.de

over 15 years ago

In reply to: Tom Lane (#12)

Re: What happened to the is_<type> family of functions proposal?

On Wednesday 22 September 2010 01:05:39 Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I don't understand the argument that we need type input functions to
be protected by a savepoint. That seems crazy to me. We're taking a
huge performance penalty here to protect against something that seems
insane to me in the first instance. Not to mention cutting ourselves
off from really important features, like the ability to recover from
errors during COPY. I don't understand why we can't just make some
rules about what type input functions are allowed to do.

There are many rules that you could possibly make for type input
functions. But "you cannot throw an error" is not one of them ---
or at least, not one that you can usefully expect to be followed
for anything more than trivial straightline code.

The poster child for this is of course domain_in(). But even without
that, I don't think you can realistically legislate that no errors be
thrown by something of the complexity of, say, the timestamp input
functions. Just for starters, what of a palloc() failure?

Uhm. Isnt a palloc failure a really, really bad example because it will kill
the session anyway? FATAL+ is not relevant in that context, right?

Andres

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Alvaro Herrera (#11)

Re: What happened to the is_<type> family of functions proposal?

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010:

...never make this work for anything more complicated than say float8in().

... which is what people want anyway. I mean, the day someone requests
is_sthcomplex, we could happily tell them that they need to use the
expensive workaround involving savepoints. I don't think we really need
to support the ones that would require truly expensive refactoring; the
simple ones would cover 99% of the use cases.

Robert was complaining about COPY in particular. It's hard to see how
you make any progress on that if you don't have pretty near 100%
coverage of datatypes. I don't object if someone puts in is_integer,
is_float, etc; but that's nowhere near a general purpose solution.

regards, tom lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Andres Freund (#13)

Re: What happened to the is_<type> family of functions proposal?

Andres Freund <andres@anarazel.de> writes:

On Wednesday 22 September 2010 01:05:39 Tom Lane wrote:

Just for starters, what of a palloc() failure?

Uhm. Isnt a palloc failure a really, really bad example because it will kill
the session anyway? FATAL+ is not relevant in that context, right?

Huh? It's not fatal, just elog(ERROR, "out of memory").

regards, tom lane

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#12)

Re: What happened to the is_<type> family of functions proposal?

On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I don't understand the argument that we need type input functions to
be protected by a savepoint. That seems crazy to me. We're taking a
huge performance penalty here to protect against something that seems
insane to me in the first instance. Not to mention cutting ourselves
off from really important features, like the ability to recover from
errors during COPY. I don't understand why we can't just make some
rules about what type input functions are allowed to do.

There are many rules that you could possibly make for type input
functions. But "you cannot throw an error" is not one of them ---
or at least, not one that you can usefully expect to be followed
for anything more than trivial straightline code.

OK. This is one of the things I don't understand. Why does throwing
an error imply that we need to abort the current transaction? Why
can't we just catch the longjmp() and trundle onwards? Obviously,
that's unsafe if a pretty wide variety of cases, but if you're just
scrutinizing the input string (even with a little bit of read-only
database access) it's not obvious to me what can go wrong. (I assume
there is something, but I don't know what it is...)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

bruce@momjian.us

over 15 years ago

In reply to: Robert Haas (#16)

Re: What happened to the is_<type> family of functions proposal?

Robert Haas wrote:

On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I don't understand the argument that we need type input functions to
be protected by a savepoint. ?That seems crazy to me. ?We're taking a
huge performance penalty here to protect against something that seems
insane to me in the first instance. ?Not to mention cutting ourselves
off from really important features, like the ability to recover from
errors during COPY. ?I don't understand why we can't just make some
rules about what type input functions are allowed to do.

There are many rules that you could possibly make for type input
functions. ?But "you cannot throw an error" is not one of them ---
or at least, not one that you can usefully expect to be followed
for anything more than trivial straightline code.

OK. This is one of the things I don't understand. Why does throwing
an error imply that we need to abort the current transaction? Why
can't we just catch the longjmp() and trundle onwards? Obviously,
that's unsafe if a pretty wide variety of cases, but if you're just
scrutinizing the input string (even with a little bit of read-only
database access) it's not obvious to me what can go wrong. (I assume
there is something, but I don't know what it is...)

That would be interesting. You would need to flag that this was not a
longjump requiring cleanup, but the harder part would be getting back to
where the error occurred. I guess you could rerun the query. :-|

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

pgsql@j-davis.com

over 15 years ago

In reply to: Robert Haas (#16)

Re: What happened to the is_<type> family of functions proposal?

On Tue, 2010-09-21 at 19:55 -0400, Robert Haas wrote:

OK. This is one of the things I don't understand. Why does throwing
an error imply that we need to abort the current transaction? Why
can't we just catch the longjmp() and trundle onwards? Obviously,
that's unsafe if a pretty wide variety of cases, but if you're just
scrutinizing the input string (even with a little bit of read-only
database access) it's not obvious to me what can go wrong. (I assume
there is something, but I don't know what it is...)

The worry (from me) would be the "little bit of read-only database
access". If you SPI_connect() without an SPI_finish(), that sounds like
a potential problem (as would anything else that requires cleanup).

Regards,
Jeff Davis

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Robert Haas (#16)

Re: What happened to the is_<type> family of functions proposal?

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There are many rules that you could possibly make for type input
functions. �But "you cannot throw an error" is not one of them ---
or at least, not one that you can usefully expect to be followed
for anything more than trivial straightline code.

OK. This is one of the things I don't understand. Why does throwing
an error imply that we need to abort the current transaction? Why
can't we just catch the longjmp() and trundle onwards? Obviously,
that's unsafe if a pretty wide variety of cases, but if you're just
scrutinizing the input string (even with a little bit of read-only
database access) it's not obvious to me what can go wrong.

The problem is to know that "all you did" was scrutinize the input
string. If it's simple straightline code (even with some C library
calls) then you can know that, but then you can write such code without
including any elog(ERROR) in it in the first place. If you are trapping
longjmps then what you'd need to assert is that no error thrown from
anywhere in any of the code reachable from that place represents a
problem that requires transaction abort to clean up after. This gets
unmaintainable remarkably quickly, especially if you invoke anything
as complicated as database access. And then there are asynchronous
error reasons (query cancel) which you shouldn't trap in any case.

regards, tom lane

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#19)

Re: What happened to the is_<type> family of functions proposal?

On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There are many rules that you could possibly make for type input
functions. But "you cannot throw an error" is not one of them ---
or at least, not one that you can usefully expect to be followed
for anything more than trivial straightline code.

OK. This is one of the things I don't understand. Why does throwing
an error imply that we need to abort the current transaction? Why
can't we just catch the longjmp() and trundle onwards? Obviously,
that's unsafe if a pretty wide variety of cases, but if you're just
scrutinizing the input string (even with a little bit of read-only
database access) it's not obvious to me what can go wrong.

The problem is to know that "all you did" was scrutinize the input
string. If it's simple straightline code (even with some C library
calls) then you can know that, but then you can write such code without
including any elog(ERROR) in it in the first place. If you are trapping
longjmps then what you'd need to assert is that no error thrown from
anywhere in any of the code reachable from that place represents a
problem that requires transaction abort to clean up after. This gets
unmaintainable remarkably quickly, especially if you invoke anything
as complicated as database access. And then there are asynchronous
error reasons (query cancel) which you shouldn't trap in any case.

Hmm. So the problem is that we don't want to accidentally catch an
error that isn't actually safe to catch. We could probably mitigate
this problem to a considerable degree by throwing data validation
errors using some special flag that say "this is a recoverable error".
And if that flag isn't set then we abort the whole transaction, but
if it is then we continue on. It's still possible for the person
writing the typinput function to set that flag when they should not,
but at least it's less likely to happen by accident. Another
alternative would be to create some kind of explicit way for the
function to RETURN an error instead of throwing it.

But neither of these things is totally bullet-proof, because you could
still do something that requires clean-up and then lie about it. To
protect against that, you'd presumably need to set some kind of a flag
whenever, say, a heap tuple gets modified, and then you could assert
said flag false. What, other than writing to the database, requires
subtransaction cleanup?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

darren@darrenduncan.net

over 15 years ago

In reply to: Colin 't Hart (#1)

craig@2ndquadrant.com

over 15 years ago

In reply to: Darren Duncan (#21)

darren@darrenduncan.net

over 15 years ago

In reply to: Craig Ringer (#22)

singh.gurjeet@gmail.com

over 15 years ago

In reply to: Robert Haas (#20)

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Gurjeet Singh (#24)

robertmhaas@gmail.com

over 15 years ago

In reply to: Tom Lane (#25)