Inconsistency in plpgsql's error context reports

Started by Tom Laneover 8 years ago6 messageshackers

tgl@sss.pgh.pa.us

over 8 years ago

I noticed that since I put in commit 390d58135, buildfarm members
that use CLOBBER_CACHE_ALWAYS are failing the added test case
with diffs like

*** 5051,5057 ****
  NOTICE:  y = 2
  ERROR:  record "r" is not assigned yet
  DETAIL:  The tuple structure of a not-yet-assigned record is indeterminate.
! CONTEXT:  PL/pgSQL function inline_code_block line 15 at RAISE
  -- Check handling of conflicts between plpgsql vars and table columns.
  set plpgsql.variable_conflict = error;
  create function conflict_test() returns setof int8_tbl as $$
--- 5051,5058 ----
  NOTICE:  y = 2
  ERROR:  record "r" is not assigned yet
  DETAIL:  The tuple structure of a not-yet-assigned record is indeterminate.
! CONTEXT:  SQL statement "SELECT r"
! PL/pgSQL function inline_code_block line 15 at RAISE
  -- Check handling of conflicts between plpgsql vars and table columns.
  set plpgsql.variable_conflict = error;
  create function conflict_test() returns setof int8_tbl as $$

The reason for the difference turns out to be that in a CCA build,
we are forcing recreation of the cached plan for the simple expression
"r", so the error gets detected during parse analysis of "SELECT r"
rather than during execution of the expression. And because this all
mostly relies on SPI, we've gone through SPI_plan_get_cached_plan,
which inserts _SPI_error_callback into the error context stack.
That's where the first CONTEXT line is coming from.

There seem to be two ways we could look at this. One is that the
new test case just needs to be rejiggered to avoid unstable output
("\set VERBOSITY terse" would be the easiest way). But there is
also room to argue that it's bad that plpgsql produces error reports
that vary depending on the phase of the moon, which is pretty much
what this would look like in the field --- cache flushes will occur
unpredictably in most application environments. In that view, where
exec_eval_simple_expr() bypasses SPI but nonetheless "has to do some of
the things SPI_execute_plan would do", one of the things it ought to be
doing is setting up an error context stack entry to duplicate the one
that SPI_execute_plan would push.

I'm of mixed mind about whether this is a good idea. Adding more overhead
to exec_eval_simple_expr() isn't great, even though it'd be just a few
instructions. And in most cases the statement-level context line that
you get anyway ought to be sufficient to localize the problem. But it's
not too hard to imagine that a CONTEXT line that only shows up some of
the time could break somebody's error handling code. It's also arguable
that it's weird that you get different CONTEXT reports depending on
whether exec_simple_check_plan thinks the expression is simple or not,
eg

regression=# do $$ declare x int := 1;
begin raise notice '%', x/0;
end $$;
ERROR: division by zero
CONTEXT: PL/pgSQL function inline_code_block line 2 at RAISE

regression=# do $$ declare x int := 1;
begin raise notice '%', (select x/0);
end $$;
ERROR: division by zero
CONTEXT: SQL statement "SELECT (select x/0)"
PL/pgSQL function inline_code_block line 2 at RAISE

Thoughts?

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#1)

Re: Inconsistency in plpgsql's error context reports

On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There seem to be two ways we could look at this. One is that the
new test case just needs to be rejiggered to avoid unstable output
("\set VERBOSITY terse" would be the easiest way). But there is
also room to argue that it's bad that plpgsql produces error reports
that vary depending on the phase of the moon, which is pretty much
what this would look like in the field --- cache flushes will occur
unpredictably in most application environments.

I am inclined toward the latter view. I think predictability is very
important and that lack of predictability is both frustrating for
users and a sign of poor programming practices under the hood. Nobody
likes clicking on the same thing twice and getting a different result
each time; as humans, we want to feel like we understand how things
work, and if they work differently at different times, we tend to
either question whether we understand them or question whether they
are actually robust systems. For example, lately my car has started
to take a variable number of gallons of gasoline to fill the tank from
almost-empty to completely full. I find that rather unnerving: how do
I know whether I'm going to run out of gas? Similarly here -- I'm not
sure whether people want that extra CONTEXT line or not, but I think
they're going to want to either get it or not get it consistently. If
not, they'll tend to feel like something is flaky.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#2)

Re: Inconsistency in plpgsql's error context reports

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There seem to be two ways we could look at this. One is that the
new test case just needs to be rejiggered to avoid unstable output
("\set VERBOSITY terse" would be the easiest way). But there is
also room to argue that it's bad that plpgsql produces error reports
that vary depending on the phase of the moon, which is pretty much
what this would look like in the field --- cache flushes will occur
unpredictably in most application environments.

I am inclined toward the latter view.

Yeah, me too. I'll see about patching exec_eval_simple_expr() to
provide a context line that matches SPI's. Seems like a HEAD-only
change though, as this will result in visible behavior change in
the typical case.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Tom Lane (#3)

Re: Inconsistency in plpgsql's error context reports

I wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There seem to be two ways we could look at this. One is that the
new test case just needs to be rejiggered to avoid unstable output
("\set VERBOSITY terse" would be the easiest way). But there is
also room to argue that it's bad that plpgsql produces error reports
that vary depending on the phase of the moon, which is pretty much
what this would look like in the field --- cache flushes will occur
unpredictably in most application environments.

I am inclined toward the latter view.

Yeah, me too. I'll see about patching exec_eval_simple_expr() to
provide a context line that matches SPI's. Seems like a HEAD-only
change though, as this will result in visible behavior change in
the typical case.

Here's a quick hack at that. I guess the main question that needs to be
asked is whether we're happy with plpgsql getting so much chattier
(as per all the regression test changes).

If we're not, the alternative that could be considered is to make SPI
expose some way to suppress its use of a context callback, and change
plpgsql to invoke that when dealing with an expression. That would
be rather more invasive code-wise, but would largely eliminate the
behavioral change as seen by users.

Another angle, if we do keep it like this, is that maybe SPI should
export _SPI_error_callback so that plpgsql can use it directly,
rather than having a copy that needs to be kept in sync.

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#4)

Re: Inconsistency in plpgsql's error context reports

On Mon, Dec 11, 2017 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a quick hack at that. I guess the main question that needs to be
asked is whether we're happy with plpgsql getting so much chattier
(as per all the regression test changes).

If we're not, the alternative that could be considered is to make SPI
expose some way to suppress its use of a context callback, and change
plpgsql to invoke that when dealing with an expression. That would
be rather more invasive code-wise, but would largely eliminate the
behavioral change as seen by users.

Another angle, if we do keep it like this, is that maybe SPI should
export _SPI_error_callback so that plpgsql can use it directly,
rather than having a copy that needs to be kept in sync.

I confess to never having really grokked, even in the pre-patch state,
why we sometimes get an "SQL statement" context line and sometimes
not. However, what strikes me about this is that the SQL statement is
a completely fabricated one that the user never entered. Consider
this bit from the existing regression test output, unpatched:

create function namedparmcursor_test7() returns void as $$
declare
c1 cursor (p1 int, p2 int) for
select * from tenk1 where thousand = p1 and tenthous = p2;
begin
open c1 (p2 := 77, p1 := 42/0);
end $$ language plpgsql;
select namedparmcursor_test7();
ERROR: division by zero
CONTEXT: SQL statement "SELECT 42/0 AS p1, 77 AS p2;"
PL/pgSQL function namedparmcursor_test7() line 6 at OPEN

Quite obviously, nothing like "SELECT 42/0 AS p1, 77 AS p2;" is
present in there anywhere. When people see an SQL statement in the
context, or at least when I see it, my inclination is to go grep for
where that SQL statement is to be found, and to be unhappy when the
answer is nowhere. It would really be a lot better if we could say
something like

CONTEXT: SQL expression 42/0

...but I realize that's probably hard to do. However, the current
situation is that plpgsql.out contains 5 "SQL statement" context
lines, of which only 1 is an SQL statement that actually appears in
the procedure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#5)

Re: Inconsistency in plpgsql's error context reports

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Dec 11, 2017 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a quick hack at that. I guess the main question that needs to be
asked is whether we're happy with plpgsql getting so much chattier
(as per all the regression test changes).

I confess to never having really grokked, even in the pre-patch state,
why we sometimes get an "SQL statement" context line and sometimes
not. However, what strikes me about this is that the SQL statement is
a completely fabricated one that the user never entered.

I spent a brief amount of time looking at whether we could hide the
"fabricated statements" more thoroughly, and decided that it would
be a lot more work than seems warranted, and there would likely be
holes in it anyway. It's not like we don't document this behavior:
the plpgsql introductory material says

All expressions used in PL/pgSQL statements are processed using
the server's main SQL executor. For example, when you write a
PL/pgSQL statement like

IF expression THEN ...

PL/pgSQL will evaluate the expression by feeding a query like

SELECT expression

to the main SQL engine.

https://www.postgresql.org/docs/devel/static/plpgsql-expressions.html

So basically my conclusion at this point is that I'm not willing
to do any more work on this than the patch I already proposed.
If you want to do something more invasive, have at it.

regards, tom lane

Inconsistency in plpgsql's error context reports

Attachments: