"A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

Started by Bryn Llewellynalmost 4 years ago7 messagesgeneral
Jump to latest
#1Bryn Llewellyn
bryn@yugabyte.com

Does the “Tip” call-out box, from which the “Subject” here is copied, and the larger story that I copied below, apply even when the executable section of the block statement in question does nothing at all that could be rolled back?

This is my block:

begin
year_as_int := year_as_text;
exception
when invalid_text_representation then
bad_integer := true;
end;

The typecast to integer will cause an error if the input text does not represent an integer. I need to detect this, when I get such a text value, and then go on to do something appropriate after the block. My tests show that I get exactly the behavior that I want. It would be tedious to program the test (presumably by typecasting to a numeric value and comparing that with the result of rounding it) to avoid the "significant" performance penalty that the doc describes.

It seems to me that the PG implementation has, in a block that has an exception section, an opportunity to wait to start a subtraction until it first hits a statement whose effect could be rolled back—and, of course, not do this if it doesn't hit such a statement. I see that this would require first preparing the statement and only then, if it needs it, starting its subtransaction. But this sounds doable.
________________________________________________________________________________

The current PL/pgSQL doc, in this section:

43.6.8. Trapping Errors
https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING <https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING&gt;
says this:

When an error is caught by an EXCEPTION clause, the local variables of the PL/pgSQL function remain as they were when the error occurred, but all changes to persistent database state within the block are rolled back.

A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one. Therefore, don't use EXCEPTION without need.

and then on Stack Overflow, here
https://stackoverflow.com/questions/46409856/why-is-catching-errors-inside-a-loop-causing-performance-issues <https://stackoverflow.com/questions/46409856/why-is-catching-errors-inside-a-loop-causing-performance-issues&gt;

Tom Lane said:

Yeah, it's about the overhead of setting up and ending a subtransaction. That's a fairly expensive mechanism, but we don't have anything cheaper that is able to recover from arbitrary errors.

and David G. Johnston said:

Show quoted text

Setting up the pl/pgsql execution layer to trap "arbitrary SQL-layer exceptions"​ is fairly expensive. Even if the user specifies specific errors the error handling mechanism in pl/pgsql is code for generic (arbitrary) errors being given to it.

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bryn Llewellyn (#1)
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

Hi

po 13. 6. 2022 v 7:45 odesílatel Bryn Llewellyn <bryn@yugabyte.com> napsal:

Does the “Tip” call-out box, from which the “Subject” here is copied, and
the larger story that I copied below, apply even when the executable
section of the block statement in question does nothing at all that could
be rolled back?

This is my block:

*begin year_as_int := year_as_text;exception when
invalid_text_representation then bad_integer := true;end;*

The typecast to integer will cause an error if the input text does not
represent an integer. I need to detect this, when I get such a text value,
and then go on to do something appropriate after the block. My tests show
that I get exactly the behavior that I want. It would be tedious to program
the test (presumably by typecasting to a numeric value and comparing that
with the result of rounding it) to avoid the "significant" performance
penalty that the doc describes.

It seems to me that the PG implementation has, in a block that has an
exception section, an opportunity to wait to start a subtraction until it
first hits a statement whose effect could be rolled back—and, of course,
not do this if it doesn't hit such a statement. I see that this would
require first preparing the statement and only then, if it needs it,
starting its subtransaction. But this sounds doable.

The block is a PLpgSQL statement (internally it is not just syntax) - and a
safepoint is created before execution of any statement inside the block,
when the block has EXCEPTION parts. PL/pgSQL is an AST interpreter. It
knows nothing about statements inside the block. The lazy implementation
theoretically can be possible, but why? Any statement can raise an
exception. There is not any evidence what statements are safe and what not.

Regards

Pavel

Show quoted text

________________________________________________________________________________

The current PL/pgSQL doc, in this section:

43.6.8. Trapping Errors

https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

says this:

When an error is caught by an EXCEPTION clause, the local variables of the
PL/pgSQL function remain as they were when the error occurred, but all
changes to persistent database state within the block are rolled back.

A block containing an EXCEPTION clause is significantly more expensive to
enter and exit than a block without one. Therefore, don't use EXCEPTION
without need.

and then on Stack Overflow, here

https://stackoverflow.com/questions/46409856/why-is-catching-errors-inside-a-loop-causing-performance-issues

Tom Lane said:

Yeah, it's about the overhead of setting up and ending a subtransaction.
That's a fairly expensive mechanism, but we don't have anything cheaper
that is able to recover from arbitrary errors.

and David G. Johnston said:

Setting up the pl/pgsql execution layer to trap "arbitrary SQL-layer
exceptions" is fairly expensive. Even if the user specifies specific errors
the error handling mechanism in pl/pgsql is code for generic (arbitrary)
errors being given to it.

#3Bryn Llewellyn
bryn@yugabyte.com
In reply to: Pavel Stehule (#2)
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

pavel.stehule@gmail.com wrote:

bryn@yugabyte.com napsal:

Does the “Tip” call-out box, from which the “Subject” here is copied, and the larger story that I copied below, apply even when the executable section of the block statement in question does nothing at all that could be rolled back?

This is my block:

begin
year_as_int := year_as_text;
exception
when invalid_text_representation then
bad_integer := true;
end;

The typecast to integer will cause an error if the input text does not represent an integer.

The block is a PLpgSQL statement (internally it is not just syntax) - and a safepoint is created before execution of any statement inside the block, when the block has EXCEPTION parts. PL/pgSQL is an AST interpreter. It knows nothing about statements inside the block. The lazy implementation theoretically can be possible, but why? Any statement can raise an exception. There is not any evidence what statements are safe and what not.

Thanks for the quick response. I'll take this to mean that the present behavior will never change—in spite of your:

The lazy implementation [that Bryn sketched] theoretically can be possible

You then said:

but why?

OK, so I'm obliged to answer.

Because SQL rests on the principle that you just say *what* you want but not *how*. Here, I want to cast my string, which putatively represents an integer, to an "int" value. The text comes from the outside world, and what is meant to be "42017" might arrive as "42O17". Or, might arrive properly, as "42000e-03".

Consider this:

create function is_int_nn(t in text)
returns boolean
language plpgsql
as $body$
declare
ok boolean not null := (t is not null);
begin
if ok then
-- Placeholder naïve REGEXPR test.
ok := (t != '') and not (t ~ 'x');
end if;

if ok then
declare
n constant numeric not null := t;
r constant numeric not null := round(n);
begin
ok := (r = n);
end;
end if;

return ok;
end;
$body$;

select
(select is_int_nn(null )::text) as test_1,
(select is_int_nn('' )::text) as test_2,
(select is_int_nn('42000x-04')::text) as test_3,
(select is_int_nn('42000e-04')::text) as test_4,
(select is_int_nn('42000e-03')::text) as test_5;

Neither the design nor the implementation of the code is by any means finished yet. And this holds, therefore, for the tests too. So this is a loud denial of « just say *what* you want ».

You might argue that any junior programmer could manage the complete exercise in a morning. But then somebody else has to review it. And it's another artifact to be managed. Generic utilities like this always present a challenge when they need to be used in more than one distinct application. You need to invent a "single source of truth" scheme.

Compare everything that "function is_int_nn(t in text)" implies with the block that I showed above.

Oracle Database 12c Release 2 (and later) has a validate_conversion() built-in.

https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/VALIDATE_CONVERSION.html#GUID-DC485EEB-CB6D-42EF-97AA-4487884CB2CD

This arrived in (some time around) 2017.

Is there any chance that PG might ship an equivalent?

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bryn Llewellyn (#3)
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

po 13. 6. 2022 v 21:02 odesílatel Bryn Llewellyn <bryn@yugabyte.com> napsal:

*pavel.stehule@gmail.com <pavel.stehule@gmail.com> wrote:*

*bryn@yugabyte.com <bryn@yugabyte.com> napsal:*

Does the “Tip” call-out box, from which the “Subject” here is copied, and
the larger story that I copied below, apply even when the executable
section of the block statement in question does nothing at all that could
be rolled back?

This is my block:

*begin year_as_int := year_as_text;exception when
invalid_text_representation then bad_integer := true;end;*

The typecast to integer will cause an error if the input text does not
represent an integer.

The block is a PLpgSQL statement (internally it is not just syntax) - and
a safepoint is created before execution of any statement inside the block,
when the block has EXCEPTION parts. PL/pgSQL is an AST interpreter. It
knows nothing about statements inside the block. The lazy implementation
theoretically can be possible, but why? Any statement can raise an
exception. There is not any evidence what statements are safe and what not.

Thanks for the quick response. I'll take this to mean that the present
behavior will never change—in spite of your:

The lazy implementation [that Bryn sketched] theoretically can be possible

You then said:

but why?

OK, so I'm obliged to answer.

Because SQL rests on the principle that you just say *what* you want but
not *how*. Here, I want to cast my string, which putatively represents an
integer, to an "int" value. The text comes from the outside world, and what
is meant to be "42017" might arrive as "42O17". Or, might arrive properly,
as "42000e-03".

Consider this:

*create function is_int_nn(t in text) returns boolean language plpgsqlas
$body$declare ok boolean not null := (t is not null);begin if ok then
-- Placeholder naïve REGEXPR test. ok := (t != '') and not (t ~
'x'); end if; if ok then declare n constant numeric not null :=
t; r constant numeric not null := round(n); begin ok := (r =
n); end; end if; return ok;end;$body$;select (select is_int_nn(null
)::text) as test_1, (select is_int_nn('' )::text) as
test_2, (select is_int_nn('42000x-04')::text) as test_3, (select
is_int_nn('42000e-04')::text) as test_4, (select
is_int_nn('42000e-03')::text) as test_5;*

Neither the design nor the implementation of the code is by any means
finished yet. And this holds, therefore, for the tests too. So this is a
loud denial of « just say *what* you want ».

You might argue that any junior programmer could manage the complete
exercise in a morning. But then somebody else has to review it. And it's
another artifact to be managed. Generic utilities like this always present
a challenge when they need to be used in more than one distinct
application. You need to invent a "single source of truth" scheme.

Compare everything that "function is_int_nn(t in text)" implies with the
block that I showed above.

Oracle Database 12c Release 2 (and later) has a validate_conversion()
built-in.

https://docs.oracle.com/en/database/oracle/oracle-
database/19/sqlrf/VALIDATE_CONVERSION.html#GUID-DC485EEB-CB6D-42EF-97AA-4487884CB2CD

This arrived in (some time around) 2017.

*Is there any chance that PG might ship an equivalent*?

I remember a long and still not closed discussion about fault tolerant copy
implementation. The problem is a lot of possibly redundant code for
exception safe input functions, if I remember well. And it is not enough
for fault tolerant copy still. Maybe it needs some refactoring of the
PostgreSQL exceptions handling system to be able to handle some exceptions
that come from a non-storage engine without the necessity to use
safepoints. I have no idea if somebody is working on this issue now, but I
don't expect so it is easy to fix it. Maybe a more probable fix can be to
reduce an overhead of savepoints. This issue is more complex than can be
visible from user perspective - and the complexity is based on how pg has
implemented exceptions.

Regards

Pavel

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bryn Llewellyn (#3)
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

Bryn Llewellyn <bryn@yugabyte.com> writes:

OK, so I'm obliged to answer.
Because SQL rests on the principle that you just say *what* you want but
not *how*.

It also rests on the principle that the programmer shouldn't be too
concerned about micro-efficiencies. You've given a perfectly good
six-line implementation of what you want; use it and be happy.

Oracle Database 12c Release 2 (and later) has a validate_conversion() built-in.

[ shrug... ] We are not Oracle. One of the main ways in which we
are not Oracle is that we support extensible database functionality.
To write a "validate_conversion" function that supports extension
datatypes, but doesn't use something morally equivalent to a
subtransaction, would be a nightmare: large, fragile, and probably
not all that much faster.

regards, tom lane

#6Christophe Pettus
xof@thebuild.com
In reply to: Pavel Stehule (#2)
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

On Jun 12, 2022, at 23:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:
The lazy implementation theoretically can be possible, but why?

Isn't one of the reasons for the savepoint (in fact, the principal reason) to reset the connection back to non-error state so that execution can continue? In that case, it really does need to create the savepoint at the start of the block, regardless of what's in it, since any statement can raise an error.

#7Bryn Llewellyn
bryn@yugabyte.com
In reply to: Christophe Pettus (#6)
Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

xof@thebuild.com wrote:

Isn't one of the reasons for the savepoint (in fact, the principal reason) to reset the connection back to non-error state so that execution can continue? In that case, it really does need to create the savepoint at the start of the block, regardless of what's in it, since any statement can raise an error.

tgl@sss.pgh.pa.us wrote:

It also rests on the principle that the programmer shouldn't be too concerned about micro-efficiencies. You've given a perfectly good six-line implementation of what you want; use it and be happy.

...

[ shrug... ] We are not Oracle. One of the main ways in which we are not Oracle is that we support extensible database functionality. To write a "validate_conversion" function that supports extension datatypes, but doesn't use something morally equivalent to a subtransaction, would be a nightmare: large, fragile, and probably not all that much faster.

pavel.stehule@gmail.com wrote:

I remember a long and still not closed discussion about fault tolerant copy implementation. The problem is a lot of possibly redundant code for exception safe input functions, if I remember well. And it is not enough for fault tolerant copy still. Maybe it needs some refactoring of the PostgreSQL exceptions handling system to be able to handle some exceptions that come from a non-storage engine without the necessity to use safepoints. I have no idea if somebody is working on this issue now, but I don't expect so it is easy to fix it. Maybe a more probable fix can be to reduce an overhead of savepoints. This issue is more complex than can be visible from user perspective - and the complexity is based on how pg has implemented exceptions.

——————————

Thanks, Tom Lane, for your reply. The note in the doc:

A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one.

uses the word "significantly". This scares the application programmer. Would you (all) consider this revised wording:

« If the executable section of a block that has an exception section makes changes to the database, then this brings significantly more expense than when there is no exception section. However, if the executable section makes only changes to local variables or session parameters, then the additional expense brought by an exception section is negligible. »

Oracle Database users had to wait about 40 years for the "validate_conversion()" built-in—despite strong and persistent user-pressure. PostreSQL is about 25 years old. So there's plenty of time to reconsider...
——————————

Thanks, Pavel Stehule, for your reply. Forget savepoints and blocks with exception sections. And forget the general case that Oracle's "validate_conversion()" handles.

It would seem to me that the code that implements "year_as_int := year_as_text" and, maybe, says "text into int won't go", is distinct from the code that it notifies to raise an error. It ought to be possible to factor out this code for re-use and use it both to do what it presently does and to implement a built-in "is_int_nn(t in text)". And maybe the same for numeric and for timestamp[tz]. These must be the common cases when such putative values come in from the UI. Proper practice insists on re-validating the conversions in the data base even though it's to be hoped that the UI will have done this.

However, in the light of Tom's « You've given a perfectly good six-line implementation of what you want; use it and be happy. », the need (almost) vanishes—except for the point that I mentioned earlier about packaging up the test in a the application development show's "utilities" kit.
——————————

Thanks, Christophe Pettus, for your reply.

it really does need to create the savepoint at the start of the block, regardless of what's in it, since any statement can raise an error.

This is necessary only when it can't be proved that the executable section that precedes the exception section cannot make database changes—hereinafter "is safe". Such safety tests are always over cautious. So, for example, it would be assumed that any transfer of control out of the executable section, and then back, was unsafe.

However, the experts have insisted that even the safety of this case cannot be proved:

an executable section with only simple assignments that use only built-in functionality.

I suppose that "PL/pgSQL is an AST interpreter" is the clue here.

Given this, then yes, you're right.