CASE control block broken by a single line comment

Started by Michal Bartakabout 2 years ago11 messageshackers
Jump to latest
#1Michal Bartak
maxym.srpl@gmail.com

Hello all

The issue described bellow exists in postgresql ver 16.2 (found in some
previous major versions)

The documentation defines a comment as:

A comment is a sequence of characters beginning with double dashes and
extending to the end of the line

When using such a comment within CASE control block, it ends up with an
error:

DO LANGUAGE plpgsql $$
DECLARE
t TEXT = 'a';
BEGIN
CASE t
WHEN 'a' -- my comment
THEN RAISE NOTICE 'a';
WHEN 'b'
THEN RAISE NOTICE 'b';
ELSE NULL;
END CASE;
END;$$;

ERROR: syntax error at end of input
LINE 1: "__Case__Variable_2__" IN ('a' -- my comment)
^
QUERY: "__Case__Variable_2__" IN ('a' -- my comment)
CONTEXT: PL/pgSQL function inline_code_block line 5 at CASE

With Regards
Michal Bartak

#2Erik Wienhold
ewie@ewie.name
In reply to: Michal Bartak (#1)
Re: CASE control block broken by a single line comment

On 2024-04-06 20:14 +0200, Michal Bartak wrote:

The issue described bellow exists in postgresql ver 16.2 (found in some
previous major versions)

Can confirm also on master.

The documentation defines a comment as:

A comment is a sequence of characters beginning with double dashes and
extending to the end of the line

When using such a comment within CASE control block, it ends up with an
error:

DO LANGUAGE plpgsql $$
DECLARE
t TEXT = 'a';
BEGIN
CASE t
WHEN 'a' -- my comment
THEN RAISE NOTICE 'a';
WHEN 'b'
THEN RAISE NOTICE 'b';
ELSE NULL;
END CASE;
END;$$;

ERROR: syntax error at end of input
LINE 1: "__Case__Variable_2__" IN ('a' -- my comment)
^
QUERY: "__Case__Variable_2__" IN ('a' -- my comment)
CONTEXT: PL/pgSQL function inline_code_block line 5 at CASE

I'm surprised that the comment is not skipped by the scanner at this
point. Maybe because the parser just reads the raw expression between
WHEN and THEN with plpgsql_append_source_text via read_sql_construct.

How about the attached patch? It's a workaround by simply adding a line
feed character between the raw expression and the closing parenthesis.

--
Erik

Attachments:

v1-0001-plpgsql-create-valid-IN-expression-for-CASE-WHEN-.patchtext/plain; charset=us-asciiDownload+36-2
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Wienhold (#2)
Re: CASE control block broken by a single line comment

Erik Wienhold <ewie@ewie.name> writes:

On 2024-04-06 20:14 +0200, Michal Bartak wrote:

The issue described bellow exists in postgresql ver 16.2 (found in some
previous major versions)

Can confirm also on master.

I'm sure it's been there a while :-(

I'm surprised that the comment is not skipped by the scanner at this
point. Maybe because the parser just reads the raw expression between
WHEN and THEN with plpgsql_append_source_text via read_sql_construct.

How about the attached patch? It's a workaround by simply adding a line
feed character between the raw expression and the closing parenthesis.

I don't have time to look into this on this deadline weekend, but
what's bothering me about this report is the worry that we've made
the same mistake elsewhere, or will do so in future. I suspect
it'd be much more robust if we could remove the comment from the
expr->query string. No idea how hard that is.

regards, tom lane

#4Erik Wienhold
ewie@ewie.name
In reply to: Tom Lane (#3)
Re: CASE control block broken by a single line comment

On 2024-04-07 06:33 +0200, Tom Lane wrote:

Erik Wienhold <ewie@ewie.name> writes:

I'm surprised that the comment is not skipped by the scanner at this
point. Maybe because the parser just reads the raw expression between
WHEN and THEN with plpgsql_append_source_text via read_sql_construct.

How about the attached patch? It's a workaround by simply adding a line
feed character between the raw expression and the closing parenthesis.

I don't have time to look into this on this deadline weekend,

Sure, no rush.

but what's bothering me about this report is the worry that we've made
the same mistake elsewhere, or will do so in future.

Right. At the moment only make_case is affected by this because it uses
the raw expression for rewriting. I checked other uses of
read_psql_construct (e.g. IF ... THEN, FOR ... LOOP) and they don't show
this bug.

I suspect it'd be much more robust if we could remove the comment from
the expr->query string. No idea how hard that is.

I slept on it and I think this can be fixed by tracking the end of the
last token before THEN and use that instead of yylloc in the call to
plpgsql_append_source_text. We already already track the token length
in plpgsql_yyleng but don't make it available outside pl_scanner.c yet.

Attached v2 tries to do that. But it breaks other test cases, probably
because the calculation of endlocation is off. I'm missing something
here.

--
Erik

#5Erik Wienhold
ewie@ewie.name
In reply to: Erik Wienhold (#4)
Re: CASE control block broken by a single line comment

I wrote:

Attached v2 tries to do that.

Hit send too soon. Attached now.

--
Erik

Attachments:

v2-0001-plpgsql-accept-trailing-line-comment-in-CASE-WHEN.patchtext/x-diff; charset=us-asciiDownload+48-3
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Wienhold (#4)
Re: CASE control block broken by a single line comment

Erik Wienhold <ewie@ewie.name> writes:

On 2024-04-07 06:33 +0200, Tom Lane wrote:

I suspect it'd be much more robust if we could remove the comment from
the expr->query string. No idea how hard that is.

I slept on it and I think this can be fixed by tracking the end of the
last token before THEN and use that instead of yylloc in the call to
plpgsql_append_source_text. We already already track the token length
in plpgsql_yyleng but don't make it available outside pl_scanner.c yet.
Attached v2 tries to do that. But it breaks other test cases, probably
because the calculation of endlocation is off. I'm missing something
here.

I poked at this and found that the failures occur when the patched
code decides to trim an expression like "_r.v" to just "_r", naturally
breaking the semantics completely. That happens because when
plpgsql_yylex recognizes a compound token, it doesn't bother to
adjust the token length to include the additional word(s). I vaguely
remember having thought about that when writing the lookahead logic,
and deciding that it wasn't worth the trouble -- but now it is.
Up to now, the only thing we did with plpgsql_yyleng was to set the
cutoff point for text reported by plpgsql_yyerror. Extending the
token length changes reports like this:

regression=# do $$ declare r record; r.x$$;
ERROR: syntax error at or near "r"
LINE 1: do $$ declare r record; r.x$$;
^

to this:

regression=# do $$ declare r record; r.x$$;
ERROR: syntax error at or near "r.x"
LINE 1: do $$ declare r record; r.x$$;
^

which seems like strictly an improvement to me (the syntax error is
premature EOF, which is after the "x"); but in any case it's minor
enough to not be worth worrying about.

Looking around, I noticed that we *have* had a similar case in the
past, which 4adead1d2 noticed and worked around by suppressing the
whitespace-trimming action in read_sql_construct. We could probably
reach a near-one-line fix for the current problem by passing
trim=false in the CASE calls, but TBH that discovery just reinforces
my feeling that we need a cleaner fix. The attached v3 reverts
the make-trim-optional hack that 4adead1d2 added, since we don't
need or want the manual trimming anymore.

With this in mind, I find the other manual whitespace trimming logic,
in make_execsql_stmt(), quite scary; but it looks somewhat nontrivial
to get rid of it. (The problem is that parsing of an INTO clause
will leave us with a pushed-back token as next, and then we don't
know where the end of the token before that is.) Since we don't
currently do anything as crazy as combining execsql statements,
I think the problem is only latent, but still...

Anyway, the attached works for me.

regards, tom lane

Attachments:

v3-0001-plpgsql-accept-trailing-line-comment-in-CASE-WHEN.patchtext/x-diff; charset=us-ascii; name*0=v3-0001-plpgsql-accept-trailing-line-comment-in-CASE-WHEN.p; name*1=atchDownload+76-26
#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#6)
Re: CASE control block broken by a single line comment

út 9. 4. 2024 v 0:55 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Erik Wienhold <ewie@ewie.name> writes:

On 2024-04-07 06:33 +0200, Tom Lane wrote:

I suspect it'd be much more robust if we could remove the comment from
the expr->query string. No idea how hard that is.

I slept on it and I think this can be fixed by tracking the end of the
last token before THEN and use that instead of yylloc in the call to
plpgsql_append_source_text. We already already track the token length
in plpgsql_yyleng but don't make it available outside pl_scanner.c yet.
Attached v2 tries to do that. But it breaks other test cases, probably
because the calculation of endlocation is off. I'm missing something
here.

I poked at this and found that the failures occur when the patched
code decides to trim an expression like "_r.v" to just "_r", naturally
breaking the semantics completely. That happens because when
plpgsql_yylex recognizes a compound token, it doesn't bother to
adjust the token length to include the additional word(s). I vaguely
remember having thought about that when writing the lookahead logic,
and deciding that it wasn't worth the trouble -- but now it is.
Up to now, the only thing we did with plpgsql_yyleng was to set the
cutoff point for text reported by plpgsql_yyerror. Extending the
token length changes reports like this:

regression=# do $$ declare r record; r.x$$;
ERROR: syntax error at or near "r"
LINE 1: do $$ declare r record; r.x$$;
^

to this:

regression=# do $$ declare r record; r.x$$;
ERROR: syntax error at or near "r.x"
LINE 1: do $$ declare r record; r.x$$;
^

which seems like strictly an improvement to me (the syntax error is
premature EOF, which is after the "x"); but in any case it's minor
enough to not be worth worrying about.

Looking around, I noticed that we *have* had a similar case in the
past, which 4adead1d2 noticed and worked around by suppressing the
whitespace-trimming action in read_sql_construct. We could probably
reach a near-one-line fix for the current problem by passing
trim=false in the CASE calls, but TBH that discovery just reinforces
my feeling that we need a cleaner fix. The attached v3 reverts
the make-trim-optional hack that 4adead1d2 added, since we don't
need or want the manual trimming anymore.

With this in mind, I find the other manual whitespace trimming logic,
in make_execsql_stmt(), quite scary; but it looks somewhat nontrivial
to get rid of it. (The problem is that parsing of an INTO clause
will leave us with a pushed-back token as next, and then we don't
know where the end of the token before that is.) Since we don't
currently do anything as crazy as combining execsql statements,
I think the problem is only latent, but still...

Anyway, the attached works for me.

+1

Pavel

Show quoted text

regards, tom lane

#8Erik Wienhold
ewie@ewie.name
In reply to: Tom Lane (#6)
Re: CASE control block broken by a single line comment

On 2024-04-09 00:54 +0200, Tom Lane wrote:

I poked at this and found that the failures occur when the patched
code decides to trim an expression like "_r.v" to just "_r", naturally
breaking the semantics completely. That happens because when
plpgsql_yylex recognizes a compound token, it doesn't bother to
adjust the token length to include the additional word(s).

Thanks Tom! I haven't had the time to look at your patch.

I'm surprised that the lexer handles compound tokens. I'd expect to
find that in the parser, especially because of using the context-aware
plpgsql_ns_lookup to determine if we have a T_DATUM or T_{WORD,CWORD}.

Is this done by the lexer to allow push-back of those compound tokens
and maybe even to also simplify some parser rules?

--
Erik

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Wienhold (#8)
Re: CASE control block broken by a single line comment

Erik Wienhold <ewie@ewie.name> writes:

I'm surprised that the lexer handles compound tokens. I'd expect to
find that in the parser, especially because of using the context-aware
plpgsql_ns_lookup to determine if we have a T_DATUM or T_{WORD,CWORD}.

I'm not here to defend plpgsql's factorization ;-). However, it
doesn't really have a parser of its own, at least not for expressions,
so I'm not sure how your suggestion could be made to work.

regards, tom lane

#10Erik Wienhold
ewie@ewie.name
In reply to: Tom Lane (#9)
Re: CASE control block broken by a single line comment

On 2024-04-13 00:20 +0200, Tom Lane wrote:

Erik Wienhold <ewie@ewie.name> writes:

I'm surprised that the lexer handles compound tokens. I'd expect to
find that in the parser, especially because of using the context-aware
plpgsql_ns_lookup to determine if we have a T_DATUM or T_{WORD,CWORD}.

I'm not here to defend plpgsql's factorization ;-). However, it
doesn't really have a parser of its own, at least not for expressions,
so I'm not sure how your suggestion could be made to work.

Not a suggestion. Just a question about the general design, unrelated
to this fix, in case you know the answer off the cuff. I see that
863a62064c already had the lexer handle those compound tokens, but
unfortunately without an explanation on why. Never mind if that's too
much to ask about a design descision made over 25 years ago.

--
Erik

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Wienhold (#10)
Re: CASE control block broken by a single line comment

Erik Wienhold <ewie@ewie.name> writes:

On 2024-04-13 00:20 +0200, Tom Lane wrote:

I'm not here to defend plpgsql's factorization ;-). However, it
doesn't really have a parser of its own, at least not for expressions,
so I'm not sure how your suggestion could be made to work.

Not a suggestion. Just a question about the general design, unrelated
to this fix, in case you know the answer off the cuff. I see that
863a62064c already had the lexer handle those compound tokens, but
unfortunately without an explanation on why. Never mind if that's too
much to ask about a design descision made over 25 years ago.

Well, it was a design decision I wasn't involved in, so I dunno.
You could reach out to Jan on the slim chance he remembers.

regards, tom lane