Improving PL/Tcl's error context reports

Started by Tom Lanealmost 2 years ago10 messageshackers

tgl@sss.pgh.pa.us

almost 2 years ago

While working on commit b631d0149, I got a bee in my bonnet about
how unfriendly PL/Tcl's error CONTEXT reports are:

* The context reports expose PL/Tcl's internal names for the Tcl
procedures it creates, which'd be fine if those names were readable.
But actually they're something like "__PLTcl_proc_NNNN", where NNNN
is the function OID. Not only is that unintelligible, but because
the OIDs aren't stable this forces us to disable display of the
CONTEXT lines in all of PL/Tcl's regression tests.

* The first line of the context report (almost?) always duplicates
the primary error message, which is redundant and not per our
normal reporting style.

So attached is a patch that attempts to improve this situation.

The key question is how to avoid including function OIDs in the
strings that will appear in the regression test outputs. The
answer I propose is to start with an internal name like
"__PLTcl_proc_NAME", where NAME is the function's normal SQL name,
and then append the OID only if that function name is not unique.
As long as we don't create test cases that involve throwing
errors from duplicatively-named functions, we can show the context
reports and still have stable regression outputs. I think this will
improve the user experience for regular users too.

PL/Tcl wants the internal names to be all-ASCII-alphanumeric,
which saves it from having to think about encoding conversion
or quoting when inserting those names into Tcl command strings.
What I did in the attached is to copy only ASCII alphanumerics
from the SQL name. Perhaps it's worth working harder but
I failed to get excited about that.

A few notes:

* To avoid unnecessarily appending the OID when a function is
redefined, I modified the logic to explicitly delete the old Tcl
command before checking for duplication. This is okay even if the
function is currently being evaluated, because Tcl's internal
reference counting prevents it from deleting the underlying code
object until it's done being executed. Really we were depending on
that reference counting to handle such cases already, but you wouldn't
have known it from our comments. I added a test case to demonstrate
explicitly that this works correctly.

* Sadly, pltcl_trigger.sql still has to suppress the context
reports. Although its function names are now stable, the reports
include trigger argument lists, which include numeric table OIDs
so they're unstable. I don't see a way to change that without
breaking API for user trigger functions.

* A hazard with this plan is that the regression tests' context
reports might turn out to be platform-dependent. I experimented
with Tcl 8.5 and 8.6 here and found one difference: the "missing
close-brace" error reported by our tcl_error() test case shows the
unmatched open-brace on one version but not the other. AFAICS the
point of that test is just to exercise some Tcl-detected error, not
necessarily that exact one, so I just modified the test case to cause
a different error. We might find additional problems once this patch
hits the buildfarm or gets out into the field.

I'll park this in the next CF.

regards, tom lane

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 2 years ago

In reply to: Tom Lane (#1)

Re: Improving PL/Tcl's error context reports

On 05/06/2024 20:42, Tom Lane wrote:

While working on commit b631d0149, I got a bee in my bonnet about
how unfriendly PL/Tcl's error CONTEXT reports are:

* The context reports expose PL/Tcl's internal names for the Tcl
procedures it creates, which'd be fine if those names were readable.
But actually they're something like "__PLTcl_proc_NNNN", where NNNN
is the function OID. Not only is that unintelligible, but because
the OIDs aren't stable this forces us to disable display of the
CONTEXT lines in all of PL/Tcl's regression tests.

* The first line of the context report (almost?) always duplicates
the primary error message, which is redundant and not per our
normal reporting style.

So attached is a patch that attempts to improve this situation.

The key question is how to avoid including function OIDs in the
strings that will appear in the regression test outputs. The
answer I propose is to start with an internal name like
"__PLTcl_proc_NAME", where NAME is the function's normal SQL name,
and then append the OID only if that function name is not unique.
As long as we don't create test cases that involve throwing
errors from duplicatively-named functions, we can show the context
reports and still have stable regression outputs. I think this will
improve the user experience for regular users too.

Yes, that sounds a lot nicer.

What happens if you rename a function? I guess the error context will
still print the old name, but that's pretty harmless.

Hmm, could we do something with tcl namespaces to allow having two
procedures with the same name? E.g. create a separate namespace, based
on the OID, for each procedure. I wonder how the stack trace would look
like then.

--
Heikki Linnakangas
Neon (https://neon.tech)

Pavel Stehule

pavel.stehule@gmail.com

almost 2 years ago

In reply to: Heikki Linnakangas (#2)

Re: Improving PL/Tcl's error context reports

čt 4. 7. 2024 v 17:27 odesílatel Heikki Linnakangas <hlinnaka@iki.fi>
napsal:

On 05/06/2024 20:42, Tom Lane wrote:

While working on commit b631d0149, I got a bee in my bonnet about
how unfriendly PL/Tcl's error CONTEXT reports are:

* The context reports expose PL/Tcl's internal names for the Tcl
procedures it creates, which'd be fine if those names were readable.
But actually they're something like "__PLTcl_proc_NNNN", where NNNN
is the function OID. Not only is that unintelligible, but because
the OIDs aren't stable this forces us to disable display of the
CONTEXT lines in all of PL/Tcl's regression tests.

* The first line of the context report (almost?) always duplicates
the primary error message, which is redundant and not per our
normal reporting style.

So attached is a patch that attempts to improve this situation.

The key question is how to avoid including function OIDs in the
strings that will appear in the regression test outputs. The
answer I propose is to start with an internal name like
"__PLTcl_proc_NAME", where NAME is the function's normal SQL name,
and then append the OID only if that function name is not unique.
As long as we don't create test cases that involve throwing
errors from duplicatively-named functions, we can show the context
reports and still have stable regression outputs. I think this will
improve the user experience for regular users too.

Yes, that sounds a lot nicer.

What happens if you rename a function? I guess the error context will
still print the old name, but that's pretty harmless.

The rename should to generate different tid, so the function will be
recompiled

<-->/************************************************************
<--> * If it's present, must check whether it's still up to date.
<--> * This is needed because CREATE OR REPLACE FUNCTION can modify the
<--> * function's pg_proc entry without changing its OID.
<--> ************************************************************/
<-->if (prodesc != NULL &&
<--><-->prodesc->internal_proname != NULL &&
<--><-->prodesc->fn_xmin == HeapTupleHeaderGetRawXmin(procTup->t_data) &&
<--><-->ItemPointerEquals(&prodesc->fn_tid, &procTup->t_self))
<-->{
<--><-->/* It's still up-to-date, so we can use it */
<--><-->ReleaseSysCache(procTup);
<--><-->return prodesc;
<-->}

Show quoted text

Hmm, could we do something with tcl namespaces to allow having two
procedures with the same name? E.g. create a separate namespace, based
on the OID, for each procedure. I wonder how the stack trace would look
like then.

--
Heikki Linnakangas
Neon (https://neon.tech)

Pavel Stehule

pavel.stehule@gmail.com

almost 2 years ago

In reply to: Heikki Linnakangas (#2)

Re: Improving PL/Tcl's error context reports

Hmm, could we do something with tcl namespaces to allow having two
procedures with the same name? E.g. create a separate namespace, based
on the OID, for each procedure. I wonder how the stack trace would look
like then.

I didn't do full test, but I think so tcl uses for error messages fully
qualified name

Show quoted text

--
Heikki Linnakangas
Neon (https://neon.tech)

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Pavel Stehule (#3)

Re: Improving PL/Tcl's error context reports

Pavel Stehule <pavel.stehule@gmail.com> writes:

čt 4. 7. 2024 v 17:27 odesílatel Heikki Linnakangas <hlinnaka@iki.fi>
napsal:

What happens if you rename a function? I guess the error context will
still print the old name, but that's pretty harmless.

The rename should to generate different tid, so the function will be
recompiled

Right. With patch:

regression=# create function bogus() returns int language pltcl as
'return [expr 1 / 0]';
CREATE FUNCTION
regression=# select bogus();
ERROR: divide by zero
CONTEXT: while executing
"expr 1 / 0"
(procedure "__PLTcl_proc_bogus" line 2)
invoked from within
"__PLTcl_proc_bogus"
in PL/Tcl function "bogus"
regression=# alter function bogus() rename to stillbogus;
ALTER FUNCTION
regression=# select stillbogus();
ERROR: divide by zero
CONTEXT: while executing
"expr 1 / 0"
(procedure "__PLTcl_proc_stillbogus" line 2)
invoked from within
"__PLTcl_proc_stillbogus"
in PL/Tcl function "stillbogus"

Hmm, could we do something with tcl namespaces to allow having two
procedures with the same name? E.g. create a separate namespace, based
on the OID, for each procedure. I wonder how the stack trace would look
like then.

If the namespace depends on the OID then we still have nonreproducible
stack traces, no? We could maybe make Tcl namespaces that match the
SQL schema names, but that doesn't get us out of the duplication
problem when there are similarly-named functions with different
argument lists.

Another idea is to make the Tcl names include the SQL schema name,
that is __PLTcl_proc_myschema_myfunction. That avoids needing to
append OIDs when the problem is functions in different schemas,
but it doesn't move the needle for overloaded functions. On the
whole I feel like that'd add verbosity without buying much.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

almost 2 years ago

In reply to: Tom Lane (#5)

Re: Improving PL/Tcl's error context reports

čt 4. 7. 2024 v 19:36 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

čt 4. 7. 2024 v 17:27 odesílatel Heikki Linnakangas <hlinnaka@iki.fi>
napsal:

What happens if you rename a function? I guess the error context will
still print the old name, but that's pretty harmless.

The rename should to generate different tid, so the function will be
recompiled

Right. With patch:

regression=# create function bogus() returns int language pltcl as
'return [expr 1 / 0]';
CREATE FUNCTION
regression=# select bogus();
ERROR: divide by zero
CONTEXT: while executing
"expr 1 / 0"
(procedure "__PLTcl_proc_bogus" line 2)
invoked from within
"__PLTcl_proc_bogus"
in PL/Tcl function "bogus"
regression=# alter function bogus() rename to stillbogus;
ALTER FUNCTION
regression=# select stillbogus();
ERROR: divide by zero
CONTEXT: while executing
"expr 1 / 0"
(procedure "__PLTcl_proc_stillbogus" line 2)
invoked from within
"__PLTcl_proc_stillbogus"
in PL/Tcl function "stillbogus"

Hmm, could we do something with tcl namespaces to allow having two
procedures with the same name? E.g. create a separate namespace, based
on the OID, for each procedure. I wonder how the stack trace would look
like then.

If the namespace depends on the OID then we still have nonreproducible
stack traces, no? We could maybe make Tcl namespaces that match the
SQL schema names, but that doesn't get us out of the duplication
problem when there are similarly-named functions with different
argument lists.

Another idea is to make the Tcl names include the SQL schema name,
that is __PLTcl_proc_myschema_myfunction. That avoids needing to
append OIDs when the problem is functions in different schemas,
but it doesn't move the needle for overloaded functions. On the
whole I feel like that'd add verbosity without buying much.

I like the idea of using a schema name inside. It doesn't fix all, but the
cost can be low, and some risk of duplicity is reduced.

Getting unique name based on suffix _oid looks not too much nice (using
_increment can be nicer), but it should to work

PLpgSQL uses more often function signature

(2024-07-04 19:49:20) postgres=# select bx(0);
ERROR: division by zero
CONTEXT: PL/pgSQL function fx(integer) line 1 at RETURN
PL/pgSQL function bx(integer) line 1 at RETURN

What can be interesting information

How much work can be using modified function signature for internal name
like

__PLTcl_proc_myschema_myfunction_integer

__PLTcl_trigger_myschema_myfunction_table_schema_table

Is there some size limit for variable name? I didn't find it.

Show quoted text

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Pavel Stehule (#6)

Re: Improving PL/Tcl's error context reports

Pavel Stehule <pavel.stehule@gmail.com> writes:

Getting unique name based on suffix _oid looks not too much nice (using
_increment can be nicer), but it should to work

Hmm, yeah we could do an increment. It'd make the results in cases
of conflict invocation-order-dependent though, which seems like it
might be worse than using OIDs.

PLpgSQL uses more often function signature

(2024-07-04 19:49:20) postgres=# select bx(0);
ERROR: division by zero
CONTEXT: PL/pgSQL function fx(integer) line 1 at RETURN
PL/pgSQL function bx(integer) line 1 at RETURN

Oh that's a good idea! So let's use format_procedure(), same as
plpgsql does, to generate the final context line that currently
reads like

in PL/Tcl function "bogus"

Then, we could apply the "pull out just alphanumerics" rule to
the result of format_procedure() to generate the internal Tcl name.
That should greatly reduce the number of cases where we have duplicate
internal names we have to unique-ify.

Is there some size limit for variable name? I didn't find it.

I did a quick test with 10000-character names and Tcl didn't
complain, so it seems like there's no hard limit.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Tom Lane (#7)

Re: Improving PL/Tcl's error context reports

I wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

PLpgSQL uses more often function signature
(2024-07-04 19:49:20) postgres=# select bx(0);
ERROR: division by zero
CONTEXT: PL/pgSQL function fx(integer) line 1 at RETURN
PL/pgSQL function bx(integer) line 1 at RETURN

Oh that's a good idea! So let's use format_procedure(), same as
plpgsql does, to generate the final context line that currently
reads like

in PL/Tcl function "bogus"

Then, we could apply the "pull out just alphanumerics" rule to
the result of format_procedure() to generate the internal Tcl name.
That should greatly reduce the number of cases where we have duplicate
internal names we have to unique-ify.

Here's a v2 that does it like that.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

almost 2 years ago

In reply to: Tom Lane (#8)

Re: Improving PL/Tcl's error context reports

čt 4. 7. 2024 v 21:42 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

I wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

PLpgSQL uses more often function signature
(2024-07-04 19:49:20) postgres=# select bx(0);
ERROR: division by zero
CONTEXT: PL/pgSQL function fx(integer) line 1 at RETURN
PL/pgSQL function bx(integer) line 1 at RETURN

Oh that's a good idea! So let's use format_procedure(), same as
plpgsql does, to generate the final context line that currently
reads like

in PL/Tcl function "bogus"

Then, we could apply the "pull out just alphanumerics" rule to
the result of format_procedure() to generate the internal Tcl name.
That should greatly reduce the number of cases where we have duplicate
internal names we have to unique-ify.

Here's a v2 that does it like that.

I like it.

- patching and compilation without any issue
- check world passed

I'll mark this as ready for commit

Regards

Pavel

Show quoted text

regards, tom lane

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Pavel Stehule (#9)

Re: Improving PL/Tcl's error context reports

Pavel Stehule <pavel.stehule@gmail.com> writes:

čt 4. 7. 2024 v 21:42 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Here's a v2 that does it like that.

I like it.
- patching and compilation without any issue
- check world passed
I'll mark this as ready for commit

Pushed, thanks!

regards, tom lane

Improving PL/Tcl's error context reports

Attachments:

Attachments: