Better handling of parse errors

Started by Gavin Sherryover 23 years ago18 messages
#1Gavin Sherry
swm@linuxworld.com.au
1 attachment(s)

Hi all,

Attached is a small patch to scan.l for consideration. It hands
yyerror() the position in the query string of the token which caused a
parse error. It is not even close to an implementation of error handling
a-la SQL99 but it certainly makes debugging complicated queries easier.

I've done some testing and it appears to hit the offending token pretty
accurately.

Can anyone find a way to break this? If not, I'd love to see it in 7.3.

Gavin

Attachments:

scanner.diff.gzapplication/x-gzip; name=scanner.diff.gzDownload
#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Gavin Sherry (#1)
Re: Better handling of parse errors

Can we see some sample output?

---------------------------------------------------------------------------

Gavin Sherry wrote:

Hi all,

Attached is a small patch to scan.l for consideration. It hands
yyerror() the position in the query string of the token which caused a
parse error. It is not even close to an implementation of error handling
a-la SQL99 but it certainly makes debugging complicated queries easier.

I've done some testing and it appears to hit the offending token pretty
accurately.

Can anyone find a way to break this? If not, I'd love to see it in 7.3.

Gavin

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Gavin Sherry
swm@linuxworld.com.au
In reply to: Bruce Momjian (#2)
Re: Better handling of parse errors

On Tue, 6 Aug 2002, Bruce Momjian wrote:

Can we see some sample output?

template1=# select * frum pg_class;
ERROR: parser: parse error at or near "frum" at character 10
template1=# select relname from pg_class a excepr select relname from
pg_class limit a;
ERROR: parser: parse error at or near "excepr" at character 32
template1=# create table a (desc int);
ERROR: parser: parse error at or near "desc" at character 17

Gavin

Show quoted text

---------------------------------------------------------------------------

Gavin Sherry wrote:

Hi all,

Attached is a small patch to scan.l for consideration. It hands
yyerror() the position in the query string of the token which caused a
parse error. It is not even close to an implementation of error handling
a-la SQL99 but it certainly makes debugging complicated queries easier.

I've done some testing and it appears to hit the offending token pretty
accurately.

Can anyone find a way to break this? If not, I'd love to see it in 7.3.

Gavin

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#4Gavin Sherry
swm@linuxworld.com.au
In reply to: Bruce Momjian (#2)
Re: Better handling of parse errors

On Tue, 6 Aug 2002, Bruce Momjian wrote:

Can we see some sample output?

One other important one:

template1=# select
template1-# *
template1-#
template1-# frum
template1-# pg_class;
ERROR: parser: parse error at or near "frum" at character 10

Note that it counts non-printable characters. This could be both good and
bad.

Gavin

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#1)
Re: Better handling of parse errors

Gavin Sherry <swm@linuxworld.com.au> writes:

Attached is a small patch to scan.l for consideration. It hands
yyerror() the position in the query string of the token which caused a
parse error.

Isn't that the hard way to do it? Seems like you could just subtract
scanbuf from the error pointer, instead of adding overhead to the basic
lex loop.

Things that need to be decided (and documented somewhere):

Is the number an offset (counted from 0) or an index (counted from 1)?
Which end of the token does it point at? Can the message be phrased
so as to make it reasonably clear what the number is?

A related change I'd been meaning to make is to get it to say
parse error at end of input
when that's the case, rather than the rather useless
parse error at or near ""
that you get now. I'd still be inclined to just say "end of input"
in that case, and not bother with a character count.

regards, tom lane

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#5)
Re: Better handling of parse errors

Gavin, have you answered these issues brought up about the patch?

---------------------------------------------------------------------------

Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

Attached is a small patch to scan.l for consideration. It hands
yyerror() the position in the query string of the token which caused a
parse error.

Isn't that the hard way to do it? Seems like you could just subtract
scanbuf from the error pointer, instead of adding overhead to the basic
lex loop.

Things that need to be decided (and documented somewhere):

Is the number an offset (counted from 0) or an index (counted from 1)?
Which end of the token does it point at? Can the message be phrased
so as to make it reasonably clear what the number is?

A related change I'd been meaning to make is to get it to say
parse error at end of input
when that's the case, rather than the rather useless
parse error at or near ""
that you get now. I'd still be inclined to just say "end of input"
in that case, and not bother with a character count.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Gavin Sherry
swm@linuxworld.com.au
In reply to: Bruce Momjian (#6)
Re: Better handling of parse errors

Bruce,

I was working on this on the train this morning. There are a few
possibilities I'd like to look at before I submit another patch.

Before I do, there is one important question that I didn't raise when I
submitted the first patch (which was only a proof of concept kind of
patch). Namely: do we want to modify every 7.2 error message? Many people
have written error message parsers into their applications to make up for
the fact that Postgres doesn't use error codes or issue 'standardised'
error messages.

Is this going to annoy people too much? Should it be delayed until we have
SQL99 diagnostics/error codes?

Gavin

On Wed, 14 Aug 2002, Bruce Momjian wrote:

Show quoted text

Gavin, have you answered these issues brought up about the patch?

---------------------------------------------------------------------------

Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

Attached is a small patch to scan.l for consideration. It hands
yyerror() the position in the query string of the token which caused a
parse error.

Isn't that the hard way to do it? Seems like you could just subtract
scanbuf from the error pointer, instead of adding overhead to the basic
lex loop.

Things that need to be decided (and documented somewhere):

Is the number an offset (counted from 0) or an index (counted from 1)?
Which end of the token does it point at? Can the message be phrased
so as to make it reasonably clear what the number is?

A related change I'd been meaning to make is to get it to say
parse error at end of input
when that's the case, rather than the rather useless
parse error at or near ""
that you get now. I'd still be inclined to just say "end of input"
in that case, and not bother with a character count.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Gavin Sherry (#7)
Re: Better handling of parse errors

I don't think anyone will mind, but you can throw a message to 'general'
just to be sure.

---------------------------------------------------------------------------

Gavin Sherry wrote:

Bruce,

I was working on this on the train this morning. There are a few
possibilities I'd like to look at before I submit another patch.

Before I do, there is one important question that I didn't raise when I
submitted the first patch (which was only a proof of concept kind of
patch). Namely: do we want to modify every 7.2 error message? Many people
have written error message parsers into their applications to make up for
the fact that Postgres doesn't use error codes or issue 'standardised'
error messages.

Is this going to annoy people too much? Should it be delayed until we have
SQL99 diagnostics/error codes?

Gavin

On Wed, 14 Aug 2002, Bruce Momjian wrote:

Gavin, have you answered these issues brought up about the patch?

---------------------------------------------------------------------------

Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

Attached is a small patch to scan.l for consideration. It hands
yyerror() the position in the query string of the token which caused a
parse error.

Isn't that the hard way to do it? Seems like you could just subtract
scanbuf from the error pointer, instead of adding overhead to the basic
lex loop.

Things that need to be decided (and documented somewhere):

Is the number an offset (counted from 0) or an index (counted from 1)?
Which end of the token does it point at? Can the message be phrased
so as to make it reasonably clear what the number is?

A related change I'd been meaning to make is to get it to say
parse error at end of input
when that's the case, rather than the rather useless
parse error at or near ""
that you get now. I'd still be inclined to just say "end of input"
in that case, and not bother with a character count.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#7)
Re: Better handling of parse errors

Gavin Sherry <swm@linuxworld.com.au> writes:

... do we want to modify every 7.2 error message?

Nyet ... but I don't think tacking an offset onto the end of
"parse error at or near foo" messages is likely to cause the
sort of generalized havoc you suggest ...

I'm on record as favoring a thorough redesign of the error-reporting
facility, and I hope to see that happen in conjunction with a frontend
protocol change in 7.4. But I think your proposed change in this patch
is fairly harmless and could be done now.

regards, tom lane

#10Gavin Sherry
swm@linuxworld.com.au
In reply to: Tom Lane (#9)
1 attachment(s)
Re: [HACKERS] Better handling of parse errors

On Wed, 14 Aug 2002, Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

... do we want to modify every 7.2 error message?

Nyet ... but I don't think tacking an offset onto the end of
"parse error at or near foo" messages is likely to cause the
sort of generalized havoc you suggest ...

In that case, attached is a patch which locates the beginning of the
offending token more efficiently (per your suggestion of using
scanbuf). The new patch does the same as before:

template1=# select * frum pg_class;
ERROR: parser: parse error at or near "frum" at character 10

It also implement's Tom's suggestion:

template1=# select * from pg_class where\g
ERROR: parse: parse error at end of input

Gavin

Attachments:

scanner2.diff.gzapplication/x-gzip; name=scanner2.diff.gzDownload
#11Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Gavin Sherry (#10)
Re: [HACKERS] Better handling of parse errors

Your patch has been added to the PostgreSQL unapplied patches list at:

http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------

Gavin Sherry wrote:

On Wed, 14 Aug 2002, Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

... do we want to modify every 7.2 error message?

Nyet ... but I don't think tacking an offset onto the end of
"parse error at or near foo" messages is likely to cause the
sort of generalized havoc you suggest ...

In that case, attached is a patch which locates the beginning of the
offending token more efficiently (per your suggestion of using
scanbuf). The new patch does the same as before:

template1=# select * frum pg_class;
ERROR: parser: parse error at or near "frum" at character 10

It also implement's Tom's suggestion:

template1=# select * from pg_class where\g
ERROR: parse: parse error at end of input

Gavin

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#12Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Gavin Sherry (#10)
Re: [HACKERS] Better handling of parse errors

Patch applied. Thanks.

---------------------------------------------------------------------------

Gavin Sherry wrote:

On Wed, 14 Aug 2002, Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

... do we want to modify every 7.2 error message?

Nyet ... but I don't think tacking an offset onto the end of
"parse error at or near foo" messages is likely to cause the
sort of generalized havoc you suggest ...

In that case, attached is a patch which locates the beginning of the
offending token more efficiently (per your suggestion of using
scanbuf). The new patch does the same as before:

template1=# select * frum pg_class;
ERROR: parser: parse error at or near "frum" at character 10

It also implement's Tom's suggestion:

template1=# select * from pg_class where\g
ERROR: parse: parse error at end of input

Gavin

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#13Peter Eisentraut
peter_e@gmx.net
In reply to: Gavin Sherry (#10)
Re: [PATCHES] Better handling of parse errors

Gavin Sherry writes:

In that case, attached is a patch which locates the beginning of the
offending token more efficiently (per your suggestion of using
scanbuf).

In the regression tests there are a couple of cases that could be
improved:

In strings.sql:

-- illegal string continuation syntax
SELECT 'first line'
' - next line' /* this comment is not allowed here */
' - third line'
AS "Illegal comment within continuation";
ERROR: parser: parse error at or near "' - third line'" at character 89

Character 89 is the end of the "third line" line, but the parse error is
at the beginning of that line.

In create_function_1.sql:

CREATE FUNCTION test1 (int) RETURNS int LANGUAGE sql
AS 'not even SQL';
ERROR: parser: parse error at or near "not" at character 1

Clearly confusing.

--
Peter Eisentraut peter_e@gmx.net

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#13)
Re: [PATCHES] Better handling of parse errors

Peter Eisentraut <peter_e@gmx.net> writes:

In strings.sql:

-- illegal string continuation syntax
SELECT 'first line'
' - next line' /* this comment is not allowed here */
' - third line'
AS "Illegal comment within continuation";
ERROR: parser: parse error at or near "' - third line'" at character 89

Character 89 is the end of the "third line" line, but the parse error is
at the beginning of that line.

This is fixed as of my later commit.

In create_function_1.sql:

CREATE FUNCTION test1 (int) RETURNS int LANGUAGE sql
AS 'not even SQL';
ERROR: parser: parse error at or near "not" at character 1

Clearly confusing.

"Character 1" is correct as of the context that the parser is working
in, namely the function body. I don't think we can do much to change
that, but perhaps we could make the message read like
ERROR: parser: parse error at or near "not" at character 1 of function body
This would require giving the parser some sort of context-identifying
string to tack onto the message, but that doesn't seem too hard.

regards, tom lane

#15Gavin Sherry
swm@linuxworld.com.au
In reply to: Tom Lane (#14)
Re: [PATCHES] Better handling of parse errors

On Sun, 18 Aug 2002, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

In strings.sql:

-- illegal string continuation syntax
SELECT 'first line'
' - next line' /* this comment is not allowed here */
' - third line'
AS "Illegal comment within continuation";
ERROR: parser: parse error at or near "' - third line'" at character 89

Character 89 is the end of the "third line" line, but the parse error is
at the beginning of that line.

This is fixed as of my later commit.

In create_function_1.sql:

CREATE FUNCTION test1 (int) RETURNS int LANGUAGE sql
AS 'not even SQL';
ERROR: parser: parse error at or near "not" at character 1

Clearly confusing.

"Character 1" is correct as of the context that the parser is working
in, namely the function body. I don't think we can do much to change
that, but perhaps we could make the message read like
ERROR: parser: parse error at or near "not" at character 1 of function body
This would require giving the parser some sort of context-identifying
string to tack onto the message, but that doesn't seem too hard.

Tom,

Reworking the code to taken into account token_start seems to work.

elog(ERROR, "parser: %s at or near \"%s\" at character %i",
message,token_start ? token_start : yytext,
token_start ? (unsigned int)(token_start - scanbuf + 1) :
(unsigned int)(yytext - scanbuf + 1));

I will submit a patch once I do some more testing to find other possible
situations where this plays up.

Gavin

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#15)
Re: [PATCHES] Better handling of parse errors

Gavin Sherry <swm@linuxworld.com.au> writes:

Reworking the code to taken into account token_start seems to work.

Yes, I did that last night ...

regards, tom lane

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#14)
Re: [PATCHES] Better handling of parse errors

Tom Lane writes:

"Character 1" is correct as of the context that the parser is working
in, namely the function body. I don't think we can do much to change
that, but perhaps we could make the message read like
ERROR: parser: parse error at or near "not" at character 1 of function body

That would be better.

--
Peter Eisentraut peter_e@gmx.net

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#17)
Re: [PATCHES] Better handling of parse errors

Peter Eisentraut <peter_e@gmx.net> writes:

Tom Lane writes:

"Character 1" is correct as of the context that the parser is working
in, namely the function body. I don't think we can do much to change
that, but perhaps we could make the message read like
ERROR: parser: parse error at or near "not" at character 1 of function body

That would be better.

After a quick look through the sources, it seems we could fairly easily
do this: callers of pg_parse_and_rewrite() and some related routines
could pass a string like "SQL function body", which would get plugged
into the parse-error message. Two issues though:

* Is this okay from an internationalization point of view? We can
gettext() the "SQL function body" string but I don't know if there
are serious problems with pasting that into
parse error at or near "%s" at character %d of %s
On the other hand I'm not comfortable with having the far-end caller
supply that whole string, either, since most of it is the lexer's
responsibility.

* The natural thing to say in _SPI_execute's call is "SPI query",
but this will probably not go over big with plpgsql users, who will
see that and probably have no idea what SPI is. But I'm very
loathe to change the SPI API so that plpgsql can pass down the
context string --- that'll break existing user functions that use
SPI. Do we want to uglify the SPI API to the extent of having
parallel calls that just add a context string parameter? Is there
a better way?

regards, tom lane