Sync ECPG scanner with core

Started by John Naylorover 7 years ago9 messageshackers

john.naylor@enterprisedb.com

over 7 years ago

Hi all,

Commit 21c8ee7946 was one of several that sync'd the backend scanner
with the psql scanner, with the commit message stating "Someday it'd
be nice to make ecpg's pgc.l more easily diff'able too, but today is
not that day." Attached is a set of patches to make progress towards
that. The ECPG tests pass. There's probably more that could be done,
but this seems like a good base to start from.

0001:
-Bring comments in line, and adjust formatting and whitespace.
-Move the block of whitespace patterns.
-Include missing exclusive states in the comment about them. I haven't
filled in the missing comments, leaving that to someone more
knowledgeable than me. It might be good to get rid of the block
comment in favor of one comment line per state, but this patch doesn't
go that far (it would need to be done for the three main scanners).
-While at it, move the rule for C++-style comments to the section with
the other ECPG-specific rules.
-Note: If you use "diff -w", it'll be much easier to read.

0002:
-Just a couple rearrangements to make 0003 easier to read

0003:
-Use a Flex feature called "start condition scope" to make some of the
ECPG rules more similar to the core scanner rules.

Finally, although unrelated to the subject at hand,
0004:
-scanner_init() is not currently used, in favor of lex_init(), and
neither is scanner_finish(), so get rid of them. Maybe it'd be cleaner
to actually use scanner_finish(), but preproc runs are short-lived
anyway.
-base_yyerror() is used once, but every other call site uses
mmerror(), so use that instead.
-mm_realloc() is declared but not defined, so remove the declaration.

I've attached a diff between the core and ECPG scanner with the above
patches applied.

In the course of doing this, I noticed a few other possible ECPG
scanner cleanups to consider, although the payoff might not justify
the work involved:

-Copy process_integer_literal() from the core scanner (maybe only if
decimal_fail is also brought over).
-Match core's handling of unicode escapes, even though pgc.l is not backup-free.
-Use reentrant APIs?
-Does ECPG still need its own functions for mm_alloc() and
mm_strdup(), if we have equivalents in src/common?

Thanks,
John Naylor

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: John Naylor (#1)

Re: Sync ECPG scanner with core

On Sun, Sep 30, 2018 at 04:32:53PM +0700, John Naylor wrote:

Commit 21c8ee7946 was one of several that sync'd the backend scanner
with the psql scanner, with the commit message stating "Someday it'd
be nice to make ecpg's pgc.l more easily diff'able too, but today is
not that day." Attached is a set of patches to make progress towards
that. The ECPG tests pass. There's probably more that could be done,
but this seems like a good base to start from.

It would be nice to add this patch to the next commit fest:
https://commitfest.postgresql.org/20/
--
Michael

John Naylor

john.naylor@enterprisedb.com

over 7 years ago

In reply to: Michael Paquier (#2)

Re: Sync ECPG scanner with core

On 9/30/18, Michael Paquier <michael@paquier.xyz> wrote:

It would be nice to add this patch to the next commit fest:
https://commitfest.postgresql.org/20/

Done, thanks.

-John Naylor

John Naylor

john.naylor@enterprisedb.com

over 7 years ago

In reply to: John Naylor (#3)

Re: Sync ECPG scanner with core

It seems the pach tester is confused by the addition of the
demonstration diff file. I'm reattaching just the patchset to see if
it turns green.

-John Naylor

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: John Naylor (#1)

Re: Sync ECPG scanner with core

John Naylor <jcnaylor@gmail.com> writes:

Commit 21c8ee7946 was one of several that sync'd the backend scanner
with the psql scanner, with the commit message stating "Someday it'd
be nice to make ecpg's pgc.l more easily diff'able too, but today is
not that day." Attached is a set of patches to make progress towards
that. The ECPG tests pass. There's probably more that could be done,
but this seems like a good base to start from.

Looks like good stuff, pushed with a small additional amount of work.

-base_yyerror() is used once, but every other call site uses
mmerror(), so use that instead.

Yeah, I think this is a bug, though minor. The other places that
deal with unexpected EOF use mmfatal(PARSE_ERROR, ...) though,
so I did it like that.

In the course of doing this, I noticed a few other possible ECPG
scanner cleanups to consider, although the payoff might not justify
the work involved:

-Copy process_integer_literal() from the core scanner (maybe only if
decimal_fail is also brought over).

I went ahead and did both parts of that, mainly because as things
stood the ecpg lexer would produce a different token sequence for
"1..10" than the core would. I don't think this really matters
right at the moment, but if for instance ecpg decided to insert
spaces where it thought the token boundaries were, it would matter.

-Match core's handling of unicode escapes, even though pgc.l is not backup-free.

Yeah, the amount of remaining diffs due to the omission of the anti-backup
rules is a bit annoying and confusing.

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone, but just in terms of using uniform lexer
design rules, it might be worthwhile. I tried a run with -b just to see,
and indeed there are a bunch of backup cases in the ECPG-specific rules,
so it'd take some work.

-Does ECPG still need its own functions for mm_alloc() and
mm_strdup(), if we have equivalents in src/common?

The error handling isn't the same, so I don't think we can just replace
them with the src/common functions. It is, however, potentially worth
our trouble to fix things so that within pgc.l, palloc() and pstrdup()
are macros for mm_alloc/mm_strdup, thereby further reducing the diffs
to the core lexer.

I'd be pretty tempted to also change all the hard references to
base_yylval to go through a pointer variable, so that diffs like
this go away:

-                    yylval->str = pstrdup(yytext);
+                    base_yylval.str = mm_strdup(yytext);

I believe that that'd result in more compact code, too --- on most
machines, references to global variables are more expensive in
code bytes than indirecting through a local pointer variable.

-Use reentrant APIs?

Hm. There's no functional value in that for ecpg, but maybe worth doing
anyway to get rid of diffs like

-                    addlit(yytext, yyleng, yyscanner);
+                    addlit(yytext, yyleng);

Guess it depends on how nitpicky you're feeling. In principle,
reducing these remaining diffs would ease future maintenance of
the lexers, but they're not something we change often.

regards, tom lane

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Tom Lane (#5)

Re: Sync ECPG scanner with core

On 2018-Nov-13, Tom Lane wrote:

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone,

Maybe time to compile the ecpg test cases during "make check-world"?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Alvaro Herrera (#6)

Re: Sync ECPG scanner with core

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-13, Tom Lane wrote:

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone,

Maybe time to compile the ecpg test cases during "make check-world"?

I'm dubious that the lexer is a significant part of that, though I could
be wrong ...

regards, tom lane

John Naylor

john.naylor@enterprisedb.com

over 7 years ago

In reply to: Tom Lane (#5)

Re: Sync ECPG scanner with core

On 11/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Looks like good stuff, pushed with a small additional amount of work.

Thanks.

-Does ECPG still need its own functions for mm_alloc() and
mm_strdup(), if we have equivalents in src/common?

The error handling isn't the same, so I don't think we can just replace
them with the src/common functions. It is, however, potentially worth
our trouble to fix things so that within pgc.l, palloc() and pstrdup()
are macros for mm_alloc/mm_strdup, thereby further reducing the diffs
to the core lexer.

I'd be pretty tempted to also change all the hard references to
base_yylval to go through a pointer variable, so that diffs like
this go away:
-                    yylval->str = pstrdup(yytext);
+                    base_yylval.str = mm_strdup(yytext);
I believe that that'd result in more compact code, too --- on most
machines, references to global variables are more expensive in
code bytes than indirecting through a local pointer variable.

I'll look into that as time permits.

On 11/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-13, Tom Lane wrote:

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone,

Maybe time to compile the ecpg test cases during "make check-world"?

I'm dubious that the lexer is a significant part of that, though I could
be wrong ...

If it were, it'd be much easier to try a Flex flag other than the
default, most compact representation, something like -Cfe. That'd be a
prerequisite for no-backtracking to matter anyway. That's easy enough
to test, so I'll volunteer to do that sometime.

-John Naylor

John Naylor

john.naylor@enterprisedb.com

over 7 years ago

In reply to: John Naylor (#8)

Re: Sync ECPG scanner with core

I wrote:

On 11/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Maybe time to compile the ecpg test cases during "make check-world"?

I'm dubious that the lexer is a significant part of that, though I could
be wrong ...

If it were, it'd be much easier to try a Flex flag other than the
default, most compact representation, something like -Cfe. That'd be a
prerequisite for no-backtracking to matter anyway. That's easy enough
to test, so I'll volunteer to do that sometime.

The preproc phase of make check in the ecpg dir only takes 150ms.
Compiling and linking the test executables takes another 2s, and
running the tests takes another 5s on my machine. Even without setting
up and tearing down the temp instance, preproc is trivial.

-John Naylor

Sync ECPG scanner with core

Attachments:

Attachments: