Sync ECPG scanner with core

Started by John Naylorover 7 years ago9 messageshackers
Jump to latest
#1John Naylor
john.naylor@enterprisedb.com

Hi all,

Commit 21c8ee7946 was one of several that sync'd the backend scanner
with the psql scanner, with the commit message stating "Someday it'd
be nice to make ecpg's pgc.l more easily diff'able too, but today is
not that day." Attached is a set of patches to make progress towards
that. The ECPG tests pass. There's probably more that could be done,
but this seems like a good base to start from.

0001:
-Bring comments in line, and adjust formatting and whitespace.
-Move the block of whitespace patterns.
-Include missing exclusive states in the comment about them. I haven't
filled in the missing comments, leaving that to someone more
knowledgeable than me. It might be good to get rid of the block
comment in favor of one comment line per state, but this patch doesn't
go that far (it would need to be done for the three main scanners).
-While at it, move the rule for C++-style comments to the section with
the other ECPG-specific rules.
-Note: If you use "diff -w", it'll be much easier to read.

0002:
-Just a couple rearrangements to make 0003 easier to read

0003:
-Use a Flex feature called "start condition scope" to make some of the
ECPG rules more similar to the core scanner rules.

Finally, although unrelated to the subject at hand,
0004:
-scanner_init() is not currently used, in favor of lex_init(), and
neither is scanner_finish(), so get rid of them. Maybe it'd be cleaner
to actually use scanner_finish(), but preproc runs are short-lived
anyway.
-base_yyerror() is used once, but every other call site uses
mmerror(), so use that instead.
-mm_realloc() is declared but not defined, so remove the declaration.

I've attached a diff between the core and ECPG scanner with the above
patches applied.

In the course of doing this, I noticed a few other possible ECPG
scanner cleanups to consider, although the payoff might not justify
the work involved:

-Copy process_integer_literal() from the core scanner (maybe only if
decimal_fail is also brought over).
-Match core's handling of unicode escapes, even though pgc.l is not backup-free.
-Use reentrant APIs?
-Does ECPG still need its own functions for mm_alloc() and
mm_strdup(), if we have equivalents in src/common?

Thanks,
John Naylor

Attachments:

v1-core2ecpg.difftext/plain; charset=US-ASCII; name=v1-core2ecpg.diffDownload+996-971
v1-0001-First-pass-at-syncing-ECPG-scanner-with-the-core-.patchtext/x-patch; charset=US-ASCII; name=v1-0001-First-pass-at-syncing-ECPG-scanner-with-the-core-.patchDownload+408-370
v1-0002-Prepatory-refactoring-for-next-commit.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Prepatory-refactoring-for-next-commit.patchDownload+14-15
v1-0003-Use-start-condition-scopes-in-the-core-psql-and-E.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Use-start-condition-scopes-in-the-core-psql-and-E.patchDownload+107-47
v1-0004-Remove-unused-functions-and-declarations-from-ECP.patchtext/x-patch; charset=US-ASCII; name=v1-0004-Remove-unused-functions-and-declarations-from-ECP.patchDownload+2-48
#2Michael Paquier
michael@paquier.xyz
In reply to: John Naylor (#1)
Re: Sync ECPG scanner with core

On Sun, Sep 30, 2018 at 04:32:53PM +0700, John Naylor wrote:

Commit 21c8ee7946 was one of several that sync'd the backend scanner
with the psql scanner, with the commit message stating "Someday it'd
be nice to make ecpg's pgc.l more easily diff'able too, but today is
not that day." Attached is a set of patches to make progress towards
that. The ECPG tests pass. There's probably more that could be done,
but this seems like a good base to start from.

It would be nice to add this patch to the next commit fest:
https://commitfest.postgresql.org/20/
--
Michael

#3John Naylor
john.naylor@enterprisedb.com
In reply to: Michael Paquier (#2)
Re: Sync ECPG scanner with core

On 9/30/18, Michael Paquier <michael@paquier.xyz> wrote:

It would be nice to add this patch to the next commit fest:
https://commitfest.postgresql.org/20/

Done, thanks.

-John Naylor

#4John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#3)
Re: Sync ECPG scanner with core

It seems the pach tester is confused by the addition of the
demonstration diff file. I'm reattaching just the patchset to see if
it turns green.

-John Naylor

Attachments:

v1-0001-First-pass-at-syncing-ECPG-scanner-with-the-core-.patchtext/x-patch; charset=US-ASCII; name=v1-0001-First-pass-at-syncing-ECPG-scanner-with-the-core-.patchDownload+408-370
v1-0002-Prepatory-refactoring-for-next-commit.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Prepatory-refactoring-for-next-commit.patchDownload+14-15
v1-0003-Use-start-condition-scopes-in-the-core-psql-and-E.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Use-start-condition-scopes-in-the-core-psql-and-E.patchDownload+107-47
v1-0004-Remove-unused-functions-and-declarations-from-ECP.patchtext/x-patch; charset=US-ASCII; name=v1-0004-Remove-unused-functions-and-declarations-from-ECP.patchDownload+2-48
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Naylor (#1)
Re: Sync ECPG scanner with core

John Naylor <jcnaylor@gmail.com> writes:

Commit 21c8ee7946 was one of several that sync'd the backend scanner
with the psql scanner, with the commit message stating "Someday it'd
be nice to make ecpg's pgc.l more easily diff'able too, but today is
not that day." Attached is a set of patches to make progress towards
that. The ECPG tests pass. There's probably more that could be done,
but this seems like a good base to start from.

Looks like good stuff, pushed with a small additional amount of work.

-base_yyerror() is used once, but every other call site uses
mmerror(), so use that instead.

Yeah, I think this is a bug, though minor. The other places that
deal with unexpected EOF use mmfatal(PARSE_ERROR, ...) though,
so I did it like that.

In the course of doing this, I noticed a few other possible ECPG
scanner cleanups to consider, although the payoff might not justify
the work involved:

-Copy process_integer_literal() from the core scanner (maybe only if
decimal_fail is also brought over).

I went ahead and did both parts of that, mainly because as things
stood the ecpg lexer would produce a different token sequence for
"1..10" than the core would. I don't think this really matters
right at the moment, but if for instance ecpg decided to insert
spaces where it thought the token boundaries were, it would matter.

-Match core's handling of unicode escapes, even though pgc.l is not backup-free.

Yeah, the amount of remaining diffs due to the omission of the anti-backup
rules is a bit annoying and confusing.

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone, but just in terms of using uniform lexer
design rules, it might be worthwhile. I tried a run with -b just to see,
and indeed there are a bunch of backup cases in the ECPG-specific rules,
so it'd take some work.

-Does ECPG still need its own functions for mm_alloc() and
mm_strdup(), if we have equivalents in src/common?

The error handling isn't the same, so I don't think we can just replace
them with the src/common functions. It is, however, potentially worth
our trouble to fix things so that within pgc.l, palloc() and pstrdup()
are macros for mm_alloc/mm_strdup, thereby further reducing the diffs
to the core lexer.

I'd be pretty tempted to also change all the hard references to
base_yylval to go through a pointer variable, so that diffs like
this go away:

-                    yylval->str = pstrdup(yytext);
+                    base_yylval.str = mm_strdup(yytext);

I believe that that'd result in more compact code, too --- on most
machines, references to global variables are more expensive in
code bytes than indirecting through a local pointer variable.

-Use reentrant APIs?

Hm. There's no functional value in that for ecpg, but maybe worth doing
anyway to get rid of diffs like

-                    addlit(yytext, yyleng, yyscanner);
+                    addlit(yytext, yyleng);

Guess it depends on how nitpicky you're feeling. In principle,
reducing these remaining diffs would ease future maintenance of
the lexers, but they're not something we change often.

regards, tom lane

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: Sync ECPG scanner with core

On 2018-Nov-13, Tom Lane wrote:

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone,

Maybe time to compile the ecpg test cases during "make check-world"?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: Sync ECPG scanner with core

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-13, Tom Lane wrote:

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone,

Maybe time to compile the ecpg test cases during "make check-world"?

I'm dubious that the lexer is a significant part of that, though I could
be wrong ...

regards, tom lane

#8John Naylor
john.naylor@enterprisedb.com
In reply to: Tom Lane (#5)
Re: Sync ECPG scanner with core

On 11/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Looks like good stuff, pushed with a small additional amount of work.

Thanks.

-Does ECPG still need its own functions for mm_alloc() and
mm_strdup(), if we have equivalents in src/common?

The error handling isn't the same, so I don't think we can just replace
them with the src/common functions. It is, however, potentially worth
our trouble to fix things so that within pgc.l, palloc() and pstrdup()
are macros for mm_alloc/mm_strdup, thereby further reducing the diffs
to the core lexer.

I'd be pretty tempted to also change all the hard references to
base_yylval to go through a pointer variable, so that diffs like
this go away:

-                    yylval->str = pstrdup(yytext);
+                    base_yylval.str = mm_strdup(yytext);

I believe that that'd result in more compact code, too --- on most
machines, references to global variables are more expensive in
code bytes than indirecting through a local pointer variable.

I'll look into that as time permits.

On 11/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Nov-13, Tom Lane wrote:

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone,

Maybe time to compile the ecpg test cases during "make check-world"?

I'm dubious that the lexer is a significant part of that, though I could
be wrong ...

If it were, it'd be much easier to try a Flex flag other than the
default, most compact representation, something like -Cfe. That'd be a
prerequisite for no-backtracking to matter anyway. That's easy enough
to test, so I'll volunteer to do that sometime.

-John Naylor

#9John Naylor
john.naylor@enterprisedb.com
In reply to: John Naylor (#8)
Re: Sync ECPG scanner with core

I wrote:

On 11/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Maybe time to compile the ecpg test cases during "make check-world"?

I'm dubious that the lexer is a significant part of that, though I could
be wrong ...

If it were, it'd be much easier to try a Flex flag other than the
default, most compact representation, something like -Cfe. That'd be a
prerequisite for no-backtracking to matter anyway. That's easy enough
to test, so I'll volunteer to do that sometime.

The preproc phase of make check in the ecpg dir only takes 150ms.
Compiling and linking the test executables takes another 2s, and
running the tests takes another 5s on my machine. Even without setting
up and tearing down the temp instance, preproc is trivial.

-John Naylor