Problems with Error Messages wrt Domains, Checks

Started by john frazeralmost 8 years ago10 messages

johnfrazer783@gmail.com

almost 8 years ago

Today I realized a number of points where PostgreSQL (v10.3) is rather
lackluster and sparse in its error messages.

The first point is illustrated by this code:

drop schema if exists X cascade;create schema X;
create domain X.an_illegal_regex as text check ( value ~ '(' );
create table X.table_with_illegal_constraint (
a text,
constraint "column a must have a bogus value" check (
a::X.an_illegal_regex = a ) );
select * from X.table_with_illegal_constraint;
insert into X.table_with_illegal_constraint values
( 'xxx' ),
-- ( 'xxx' ),
( 'foo' ),
( 'xyx' );

This code will throw with

psql:db/experiments/pg-error-fail-illegal-regex.sql:17: ERROR:
invalid regular expression: parentheses () not balanced

There are several problems with this error message:

FAILURE: the error is really in line 5 where a syntactically invalid RegEx
is created; the fact that it is a RegEx and not a general string is obvious
from the semantics of the ~ (tilde) operator at that point in time.

FAILURE: the offending RegEx is not referred to and not quoted in the error
message. As such, it could be anywhere in my many, many kLOCs big DB
definition. I cannot even search the RegEx with a RegEx because all I know
is some parenthesis is missing, somewhere: RegExes cannot match
parentheses, and PG RegExes do not have a unique syntactic marker to them.

FAILURE: before the insert statement, everything runs dandy. We could have
built an entire data warehouse application on top of a table definition
that can never be syntactically processed but which will only fail when
someone accidentally tries to insert a line.

FAILURE: I can select from a table with a syntactically invalid definition.

The second point is related:

drop schema if exists X cascade;create schema X;
create domain X.a_legal_regex as text check ( value ~ '^x' );
create table X.table_with_constraints (
a text,
constraint "column a must start with x" check ( a::X.a_legal_regex = a ),
constraint "field b must have 3 characters" check (
character_length( a ) = 3 ) );
insert into X.table_with_constraints values
( 'xxx' ),
( 'foo' ), /* A: violates first constraint */
-- ( 'xxxx' ), /* B: violates second constraint */
( 'xyx' );

With only line B active, this gives:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR: new row for relation "table_with_constraints" violatescheck
constraint "field b must have 3 characters"
DETAIL: Failing row contains (xxxx).

SUCCESS: we get the name of the relation *and* the name of the violated
rule.

SUCCESS: the offending piece of data is quoted.

FAILURE: we don't get the full name of the relation, which is
"X"."table_with_constraints". Neither do we get the name of the column that
received the offending value.

Lastly, with only line A (not line B) active:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR: value for domain x.a_legal_regex violates check constraint
"a_legal_regex_check"

FAILURE: no reference to the affected table, column is made.

FAILURE: no reference to the offending piece of data is made.

FAILURE: no reference to the offended constraint is made ("column a must
start with x").

What are the best practices or workarounds for the above shortcomings? I've
been trying for several hours to figure out what causes an error message a
la value for domain xxx violates check constraint "xxx_check" by rewriting
table definitions, inserting data row by row and so on, to no avail. What I
need is a full chain of the objects (column -> table -> constraint ->
domain -> check) that are involved in the error.
I'm writing this to the developers' list because I see the above
observations as serious shortcomings in an otherwise great piece of
software that can probably not be fixed by using client-side code only.

David G. Johnston

david.g.johnston@gmail.com

almost 8 years ago

In reply to: john frazer (#1)

Re: Problems with Error Messages wrt Domains, Checks

On Sat, Mar 17, 2018 at 6:14 AM, john frazer <johnfrazer783@gmail.com>
wrote:

Today I realized a number of points where PostgreSQL (v10.3) is rather
lackluster and sparse in its error messages.

You may find the following thread and its predecessors enlightening.

/messages/by-id/CAD3a31WR8X1TpjR_MoZxuz4S0BO3ZkPAeLoQ9rPxKHG=728eoQ@mail.gmail.com

Basically, the fundamental problem is type input is performed in a
relatively isolated fashion since there is no requirement that a table or
column of said type even exist.

psql:db/experiments/pg-error-fail-illegal-regex.sql:17: ERROR:
invalid regular expression: parentheses () not balanced

There are several problems with this error message:

FAILURE: the error is really in line 5 where a syntactically invalid RegEx
is created; the fact that it is a RegEx and not a general string is obvious
from the semantics of the ~ (tilde) operator at that point in time.

Yeah, the fact that we don't "compile" expressions is unfortunate. Not

sure if there are any plans to do so or what limitations there are

FAILURE: the offending RegEx is not referred to and not quoted in the
error message.

This seems like an easy enough oversight to correct. In all the
discussion about being challenging to identify location I don't recall
seeing anything about why we aren't at least showing the offending input
value.

As such, it could be anywhere in my many, many kLOCs big DB definition. I
cannot even search the RegEx with a RegEx because all I know is some
parenthesis is missing, somewhere:

Well, the error does point to the first statement in the chain of issues -
working backward a couple of steps is possible.

RegExes cannot match parentheses,

Sure they can.

and PG RegExes do not have a unique syntactic marker to them.

True

FAILURE: before the insert statement, everything runs dandy. We could
have built an entire data warehouse application on top of a table
definition that can never be syntactically processed but which will only
fail when someone accidentally tries to insert a line.

Since this is going to fail every single time you add a record I'm lacking
sympathy here. "Accidentally tries to insert a line" - if the table wasn't
meant to be used why does it exist in the first place? And if it is
intended to be used then functional testing should quickly point out
something like this.

FAILURE: I can select from a table with a syntactically invalid definition.

You're stretching here if you think this is an important failure point.
Since the table cannot not have valid data there would be nothing to
select. Checking constraints during selection is undesireable - they
should be an are only checked during insertion or when the constraint
itself changes.

With only line B active, this gives:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR: new row for relation "table_with_constraints" violatescheck constraint "field b must have 3 characters"
DETAIL: Failing row contains (xxxx).

SUCCESS: we get the name of the relation *and* the name of the violated
rule.

SUCCESS: the offending piece of data is quoted.

FAILURE: we don't get the full name of the relation, which is
"X"."table_with_constraints". Neither do we get the name of the column that
received the offending value.

No, you get "check constraint field b must have 3 characters" with the
owning table. You've defined a table constraint so there is no directly
attached column to report - the expression as a whole fails and we don't
report which boolean aspects of the expression where true and false. You
do get the input value which makes manually resolving the expression
possible. The lack of schema-qualification on the table identifier seems
like an oversight but at the moment I'd not willing to go find evidence in
support to opposition to that thought.

Lastly, with only line A (not line B) active:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR: value for domain x.a_legal_regex violates check constraint "a_legal_regex_check"

FAILURE: no reference to the affected table, column is made.

FAILURE: no reference to the offending piece of data is made

Repeat of the first example, same explanations apply. Hopefully this gets
improved eventually.

FAILURE: no reference to the offended constraint is made ("column a must
start with x").

It never got that far into the validation process. It couldn't even form
a value of correct type that the constraint expression could evaluate. I
suppose this is just another aspect of the first problem - the isolation of
type conversion and the absence of keeping and reporting a stack-trace.
Someone more knowledgeable than I would need to expound on the similarities
and differences.

What are the best practices or workarounds for the above shortcomings?
I've been trying for several hours to figure out what causes an error
message a la value for domain xxx violates check constraint "xxx_check"
by rewriting table definitions, inserting data row by row and so on, to no
avail. What I need is a full chain of the objects (column -> table ->
constraint -> domain -> check) that are involved in the error.

If the error starts with "value for domain xxx violates" then the only
place to look is at your "create domain" statement for "xxx". Well, at
least once you know what the actually offending value is...which right now
might be a bit of a challenge depending on the situation (if your first
example applies the data doesn't matter and critically evaluating the
create domain statement might be sufficient).

I'm writing this to the developers' list because I see the above
observations as serious shortcomings in an otherwise great piece of
software that can probably not be fixed by using client-side code only.

I suppose the best practice when dealing with a lack of information in the
error handle code flows is to limit the amount of context that is in scope
by unit testing. And while they are absolutely short-comings overcoming
them has cost in terms of both developer effort and, more importantly,
runtime performance.

David J.

john frazer

johnfrazer783@gmail.com

almost 8 years ago

In reply to: john frazer (#1)

Fwd: Problems with Error Messages wrt Domains, Checks

---------- Forwarded message ----------
From: john frazer <johnfrazer783@gmail.com>
Date: Sat, Mar 17, 2018 at 6:28 PM
Subject: Re: Problems with Error Messages wrt Domains, Checks
To: "David G. Johnston" <david.g.johnston@gmail.com>

Thanks for your consideration,

I'll try to be brief.

As such, it could be anywhere in my many, many kLOCs big DB

definition. I cannot even search the RegEx with a RegEx because all I know
is some parenthesis is missing, somewhere:

Well, the error does point to the first statement in the chain of issues

- working backward a couple of steps is possible.

In this particular case that is possible, and I did manage to do it. The
point is that in the
general case the faulty regular expression could be anywhere, and there's
no clue given at all.

RegExes cannot match parentheses,
Sure they can.
and PG RegExes do not have a unique syntactic marker to them.
True

I meant to say they can't detect matching parentheses or lack thereof.

FAILURE: before the insert statement, everything runs dandy. We could

have built an entire data warehouse application on top of a table
definition that can never be syntactically processed but which will only
fail when someone accidentally tries to insert a line.

Since this is going to fail every single time you add a record I'm

lacking sympathy here. "Accidentally tries to insert a line" - if the
table wasn't meant to be used why does it exist in the first place? And
if it is intended to be used then functional testing should quickly point
out something like this.

But there clearly can be tables that are used only now and then and might
get checked
for absence of rows. But regardless, I think the point stands that ideally
you shouldn't
be able to succesfully declare nonsensical objects and only be told so some
kinds of
usage patterns by runtime errors (with defective contexts), and in most
cases, pgSQL
does keep that promise.

FAILURE: I can select from a table with a syntactically invalid

definition.

You're stretching here if you think this is an important failure point.

Since the table cannot not have valid data there would be nothing to
select. Checking constraints during selection is undesireable - they
should be an are only checked during insertion or when the constraint
itself changes.

To clarify, I do not suggest checking constraints during `select`, I suggest
tighter checks at table creation time. I should not be able to construct
any kind of object that says (in SQL or maybe even plpgsql, too) something
to the effect `x ~ '('` because that string in that syntactic context which
must
be a RegexLiteral is syntactically bogus. (Yes operators can be redefined
but the
only thing that counts here is the definition of the `~` operator at the
point in time
that the table gets created, and that expression is constant with a
literal, so it's
not much difference between this and checking against, say, `x > 0` which
could and
would fail for illegal literals and non-matching types.)

I suppose the best practice when dealing with a lack of information in

the error handle code flows is to limit the amount of context that is in
scope by unit testing. And while they are absolutely short-comings
overcoming them has cost in terms of both developer effort and, more
importantly, runtime performance.

I'm afraid no amount of unit testing of the DDL code can do this for me.
Yes,
in the first reported cases (the invalid RegExp), I can make sure I use each
expression at least once so unsyntactic ones will make themselves shown. But
in the other two cases, well, the production environment in which this came
up
has an insert statement that takes data from a largish source into the
target table
(20k rows of altogether >2m rows), and I *can't* unit test that data.

FWIW the workaround that IÄve found is this:

create table X.table_with_constraints (
my_column text,
constraint "my_column must start with 'x'" check ( Q.starts_with_x(
my_column ) ),
constraint "my_column must have 3 chrs" check ( Q.has_3_characters(
my_column ) ) );

In other words, I dispense with domains and use (small, boolean) functions
(defined as `select` one-liners)
because only then do I get told what piece of data comes doen the wrong way
and where.
It's a shame because this is essentially what I expect to do in a language
like
JavaScript.

On Sat, Mar 17, 2018 at 4:20 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Sat, Mar 17, 2018 at 6:14 AM, john frazer <johnfrazer783@gmail.com>
wrote:

Today I realized a number of points where PostgreSQL (v10.3) is rather
lackluster and sparse in its error messages.

You may find the following thread and its predecessors enlightening.

/messages/by-id/CAD3a31WR8X1TpjR_MoZxu
z4S0BO3ZkPAeLoQ9rPxKHG%3D728eoQ%40mail.gmail.com

Basically, the fundamental problem is type input is performed in a
relatively isolated fashion since there is no requirement that a table or
column of said type even exist.

psql:db/experiments/pg-error-fail-illegal-regex.sql:17: ERROR:
invalid regular expression: parentheses () not balanced

There are several problems with this error message:

FAILURE: the error is really in line 5 where a syntactically invalid
RegEx is created; the fact that it is a RegEx and not a general string is
obvious from the semantics of the ~ (tilde) operator at that point in
time.

Yeah, the fact that we don't "compile" expressions is unfortunate. Not

sure if there are any plans to do so or what limitations there are

FAILURE: the offending RegEx is not referred to and not quoted in the
error message.

This seems like an easy enough oversight to correct. In all the
discussion about being challenging to identify location I don't recall
seeing anything about why we aren't at least showing the offending input
value.

As such, it could be anywhere in my many, many kLOCs big DB definition. I
cannot even search the RegEx with a RegEx because all I know is some
parenthesis is missing, somewhere:

Well, the error does point to the first statement in the chain of issues
- working backward a couple of steps is possible.

RegExes cannot match parentheses,

Sure they can.

and PG RegExes do not have a unique syntactic marker to them.

True

FAILURE: before the insert statement, everything runs dandy. We could
have built an entire data warehouse application on top of a table
definition that can never be syntactically processed but which will only
fail when someone accidentally tries to insert a line.

Since this is going to fail every single time you add a record I'm
lacking sympathy here. "Accidentally tries to insert a line" - if the
table wasn't meant to be used why does it exist in the first place? And
if it is intended to be used then functional testing should quickly point
out something like this.

FAILURE: I can select from a table with a syntactically invalid
definition.

You're stretching here if you think this is an important failure point.
Since the table cannot not have valid data there would be nothing to
select. Checking constraints during selection is undesireable - they
should be an are only checked during insertion or when the constraint
itself changes.

With only line B active, this gives:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR: new row for relation "table_with_constraints" violatescheck constraint "field b must have 3 characters"
DETAIL: Failing row contains (xxxx).

SUCCESS: we get the name of the relation *and* the name of the violated
rule.

SUCCESS: the offending piece of data is quoted.

FAILURE: we don't get the full name of the relation, which is
"X"."table_with_constraints". Neither do we get the name of the column that
received the offending value.

No, you get "check constraint field b must have 3 characters" with the
owning table. You've defined a table constraint so there is no directly
attached column to report - the expression as a whole fails and we don't
report which boolean aspects of the expression where true and false. You
do get the input value which makes manually resolving the expression
possible. The lack of schema-qualification on the table identifier seems
like an oversight but at the moment I'd not willing to go find evidence in
support to opposition to that thought.

Lastly, with only line A (not line B) active:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR: value for domain x.a_legal_regex violates check constraint "a_legal_regex_check"

FAILURE: no reference to the affected table, column is made.

FAILURE: no reference to the offending piece of data is made

Repeat of the first example, same explanations apply. Hopefully this
gets improved eventually.

FAILURE: no reference to the offended constraint is made ("column a must
start with x").

It never got that far into the validation process. It couldn't even form
a value of correct type that the constraint expression could evaluate. I
suppose this is just another aspect of the first problem - the isolation of
type conversion and the absence of keeping and reporting a stack-trace.
Someone more knowledgeable than I would need to expound on the similarities
and differences.

What are the best practices or workarounds for the above shortcomings?
I've been trying for several hours to figure out what causes an error
message a la value for domain xxx violates check constraint "xxx_check"
by rewriting table definitions, inserting data row by row and so on, to no
avail. What I need is a full chain of the objects (column -> table ->
constraint -> domain -> check) that are involved in the error.

If the error starts with "value for domain xxx violates" then the only
place to look is at your "create domain" statement for "xxx". Well, at
least once you know what the actually offending value is...which right now
might be a bit of a challenge depending on the situation (if your first
example applies the data doesn't matter and critically evaluating the
create domain statement might be sufficient).

I'm writing this to the developers' list because I see the above
observations as serious shortcomings in an otherwise great piece of
software that can probably not be fixed by using client-side code only.

I suppose the best practice when dealing with a lack of information in
the error handle code flows is to limit the amount of context that is in
scope by unit testing. And while they are absolutely short-comings
overcoming them has cost in terms of both developer effort and, more
importantly, runtime performance.

David J.

Import Notes

Reply to msg id not found: CAC-4GVZMHogOPt10LvJTG1tTLqYj7ECkdA6fb7PTULQRjJb5uA@mail.gmail.com

David G. Johnston

david.g.johnston@gmail.com

almost 8 years ago

In reply to: john frazer (#3)

Re: Problems with Error Messages wrt Domains, Checks

On Sat, Mar 17, 2018 at 12:54 PM, john frazer <johnfrazer783@gmail.com>
wrote:

---------- Forwarded message ----------
From: john frazer <johnfrazer783@gmail.com>
Date: Sat, Mar 17, 2018 at 6:28 PM
Subject: Re: Problems with Error Messages wrt Domains, Checks
To: "David G. Johnston" <david.g.johnston@gmail.com>

As such, it could be anywhere in my many, many kLOCs big DB

definition. I cannot even search the RegEx with a RegEx because all I know
is some parenthesis is missing, somewhere:

Well, the error does point to the first statement in the chain of

issues - working backward a couple of steps is possible.

In this particular case that is possible, and I did manage to do it. The
point is that in the
general case the faulty regular expression could be anywhere, and there's
no clue given at all.

Frankly, I'm not seeing "invalid constant regular expressions" as being a
large scale problem - but I'll agree that having the error include the
actual literal being parsed as a RegEx should be done. If the targeted
reporting (e.g., stack trace) gets fixed as a side-effect of the more
annoying type input errors - which usually involves dynamic data - that
would be swell.

FAILURE: before the insert statement, everything runs dandy. We

could have built an entire data warehouse application on top of a table
definition that can never be syntactically processed but which will only
fail when someone accidentally tries to insert a line.

Since this is going to fail every single time you add a record I'm

lacking sympathy here. "Accidentally tries to insert a line" - if the
table wasn't meant to be used why does it exist in the first place? And
if it is intended to be used then functional testing should quickly point
out something like this.

But there clearly can be tables that are used only now and then and might
get checked
for absence of rows. But regardless, I think the point stands that ideally
you shouldn't
be able to succesfully declare nonsensical objects and only be told so
some kinds of
usage patterns by runtime errors (with defective contexts), and in most
cases, pgSQL
does keep that promise.

I'm not disagreeing but I'm also not part of the solution. In terms of
importance I'd say its not that high given that I've never really felt the
lack personally. An invalid object, even though it doesn't fail at
creation, usually fails immediately after its first use which happens soon
enough after creation as to make pin-pointing its location generally
trivial.

I suppose the best practice when dealing with a lack of information in

the error handle code flows is to limit the amount of context that is in
scope by unit testing. And while they are absolutely short-comings
overcoming them has cost in terms of both developer effort and, more
importantly, runtime performance.

I'm afraid no amount of unit testing of the DDL code can do this for me.
Yes,
in the first reported cases (the invalid RegExp), I can make sure I use
each
expression at least once so unsyntactic ones will make themselves shown.
But
in the other two cases, well, the production environment in which this
came up
has an insert statement that takes data from a largish source into the
target table
(20k rows of altogether >2m rows), and I *can't* unit test that data.

I'd be inclined to not constrain the table itself at all and instead
perform soft validation post-load. You can process and remove offending
records and then add the constraints as a sanity check/documentation.

FWIW the workaround that IÄve found is this:

create table X.table_with_constraints (
my_column text,
constraint "my_column must start with 'x'" check ( Q.starts_with_x(
my_column ) ),
constraint "my_column must have 3 chrs" check ( Q.has_3_characters(
my_column ) ) );

In other words, I dispense with domains and use (small, boolean) functions
(defined as `select` one-liners)
because only then do I get told what piece of data comes doen the wrong
way and where.
It's a shame because this is essentially what I expect to do in a language
like
JavaScript.

Yes, hopefully we can decide and implement value reporting for v12 (and
consider adding it in for v11) which would at least avoid the need for the
functions and thus rely on just the named constraints.

Though since its just you and I on this thread there is no one who can
write a patch speaking up...

David J.

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: David G. Johnston (#4)

Re: Problems with Error Messages wrt Domains, Checks

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Frankly, I'm not seeing "invalid constant regular expressions" as being a
large scale problem - but I'll agree that having the error include the
actual literal being parsed as a RegEx should be done.

Agreed. Doing anything about the other stuff discussed in this thread is
fairly large-scale work, but adjusting our regex error messages is easy.

At least, it is if we can get consensus on what they should look like :-).
There's at least one place that already includes the regex proper in
its error, in hba.c:

ereport(LOG,
(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
errmsg("invalid regular expression \"%s\": %s",
parsedline->ident_user + 1, errstr)));

But I wonder if we wouldn't be better off to put the regex into a
detail line, ie

errmsg("invalid regular expression: %s", ...),
errdetail("Regular expression is \"%s\".", ...),

The reason I'm concerned about that is I've seen some pretty hairy
regexes, way longer than are reasonable to include in a primary
error message. The one-line way is nice for short regexes, but
it could get out of hand. Also, for the principal use-cases in regexp.c,
we could avoid changing the primary message text from what it is now.
That might prevent some unnecessary client breakage (not that people
are supposed to be testing message string contents, but ...)

Thoughts?

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

almost 8 years ago

In reply to: Tom Lane (#5)

Re: Problems with Error Messages wrt Domains, Checks

On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Frankly, I'm not seeing "invalid constant regular expressions" as being a
large scale problem - but I'll agree that having the error include the
actual literal being parsed as a RegEx should be done.

Agreed. Doing anything about the other stuff discussed in this thread is
fairly large-scale work, but adjusting our regex error messages is easy.
[...]

But I wonder if we wouldn't be better off to put the regex into a
detail line, ie

errmsg("invalid regular expression: %s", ...),
errdetail("Regular expression is \"%s\".", ...),

The reason I'm concerned about that is I've seen some pretty hairy
regexes, way longer than are reasonable to include in a primary
error message. The one-line way is nice for short regexes, but
it could get out of hand.

I write many of those - albeit less so when working with database embedded
expressions as opposed to application-layer.

I'd consider at least supplying the first 30 or so characters (or maybe up
to the first newline, whichever is shorter) in the main message and then
the entire regex in the detail line.

invalid regular expression starting with: %s

I think having the typical regex in the message will aid users that use
client interfaces that don't propagate error detail by default - and it
should be sufficient to narrow down, if not pinpoint, the offending regex
in most cases. And, faced with multiple, the user can add a leading
comment to the regexp to help narrow down the options if necessary.

David J.

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: David G. Johnston (#6)

Re: Problems with Error Messages wrt Domains, Checks

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But I wonder if we wouldn't be better off to put the regex into a
detail line, ie
errmsg("invalid regular expression: %s", ...),
errdetail("Regular expression is \"%s\".", ...),

I'd consider at least supplying the first 30 or so characters (or maybe up
to the first newline, whichever is shorter) in the main message and then
the entire regex in the detail line.

That seems like a lot more complication than this is worth, and it'd be
confusing to users as well, if things are formatted differently for short
and long regexes.

Also, by my count there are at least eight places in the code that
need to emit messages like this; if we turn formatting the messages into
a major production, people will take shortcuts. Some already have, eg
spell.c is failing to report pg_regexec failures at all. I thought
about trying to discourage deviations by using common error-reporting
subroutines, but there are enough legit differences in what needs to
be done that that might not work well.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

almost 8 years ago

In reply to: Tom Lane (#7)

Re: Problems with Error Messages wrt Domains, Checks

On Mon, Mar 19, 2018 at 9:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But I wonder if we wouldn't be better off to put the regex into a
detail line, ie
errmsg("invalid regular expression: %s", ...),
errdetail("Regular expression is \"%s\".", ...),

I'd consider at least supplying the first 30 or so characters (or maybe

up

to the first newline, whichever is shorter) in the main message and then
the entire regex in the detail line.

That seems like a lot more complication than this is worth, and it'd be
confusing to users as well, if things are formatted differently for short
and long regexes.

They would be formatted the same every time - just need to remember to add
the extra function call around the expression variable in the errmsg case.
Definitely not married to the idea though.

Saying "begins with" and then showing the entire regex shouldn't cause too
much confusion I'd hope.

I thought
about trying to discourage deviations by using common error-reporting
subroutines, but there are enough legit differences in what needs to
be done that that might not work well.

Two support functions might suffice:

present_regexp_for_errmsg(regex_var)
present_regexp_for_detailmsg(regex_var)

We can at least centralize how the expression itself is string-ified.

David J.

David G. Johnston

david.g.johnston@gmail.com

almost 8 years ago

In reply to: Tom Lane (#5)

Re: Problems with Error Messages wrt Domains, Checks

On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Frankly, I'm not seeing "invalid constant regular expressions" as being a
large scale problem - but I'll agree that having the error include the
actual literal being parsed as a RegEx should be done.

Agreed. Doing anything about the other stuff discussed in this thread is
fairly large-scale work, but adjusting our regex error messages is easy.

Another one that seems to fall into the "fairly large-scale work" would be
the:

ERROR: more than one row returned by a subquery used as an expression
(that's it, nothing else prints in psql when I run the offending query -
using "--echo-all" to at least see what query I sent caused the issue)

Found this via Google

/messages/by-id/001201ce70d8$718bd780$54a38680$@kapila@huawei.com

Printing out the offending expression would be nice if possible - printing
the set (size > 1) of values that were returned would probably help as
well, though usually the problem data is in a where clause while the set
would contain target list data - and its those where clause items that
matter more. In the case of a correlated subquery exhibiting this problem
the outer query vars being passed in is what would be most helpful.

David J.

P.S. I consider this to be on-topic to the general "hard to debug" topic
this thread covers even if its doesn't involve domains...I may gripe more
prominently later...

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: David G. Johnston (#9)

Re: Problems with Error Messages wrt Domains, Checks

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Another one that seems to fall into the "fairly large-scale work" would be
the:
ERROR: more than one row returned by a subquery used as an expression
(that's it, nothing else prints in psql when I run the offending query -
using "--echo-all" to at least see what query I sent caused the issue)

Yeah, this falls into the general category of unlocalized run-time errors.
I still think that emitting an error cursor would be a general-purpose
answer that would improve the user experience for this and many other
cases. You could imagine specific behaviors that would be more optimal
for this specific error report, but the approach of solving this issue
one error at a time doesn't scale even a little bit.

Printing out the offending expression would be nice if possible - printing
the set (size > 1) of values that were returned would probably help as
well, though usually the problem data is in a where clause while the set
would contain target list data - and its those where clause items that
matter more. In the case of a correlated subquery exhibiting this problem
the outer query vars being passed in is what would be most helpful.

IIRC, this error is thrown as soon as we get a second row out of the
subquery; we don't know what the ultimate result-set size would be,
and I rather doubt that finding that out would be helpful very often.
Nor does it seem like figuring out how to identify the parameter values
passed down to the subquery (if any) would really repay the effort.
The set that the executor thinks it's passing down might not have that
much to do with what the user thinks the query is doing.

regards, tom lane