pure parsers and reentrant scanners

Started by Peter Eisentrautover 1 year ago25 messages
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

This patch series changes several parsers in the backend and contrib
modules to use bison pure parsers and flex reentrant scanners. This is
ultimately toward thread-safety, but I think it's also just nicer in
general, and it might also fix a few possible small memory leaks.

I organized this patch series into very small incremental changes so
that it's easier to follow. The final commits should probably combined a
bit more (e.g., one per module).

In this patch series I have so far dealt with

* contrib/cube/
* contrib/seg/
* src/backend/replication/repl_*
* src/backend/replication/syncrep_*

These four needed the whole treatment: pure parser, reentrant scanner,
and updated memory handling.

Also:
* src/backend/utils/adt/jsonpath_scan.l

This one already had a pure parser and palloc-based memory handling, but
not a reentrant scanner, so I just did that.

The above are all pretty similar, so it was relatively easy to work
through them once I had the first one figured out.

A couple of things that are still missing in the above:

* For repl_scanner.l, I want to use yyextra to deal with the static
variables marked /*FIXME*/, but somehow I made that buggy, I'll need to
take another look later.
* For both the replication parser and the syncrep parser, get rid of the
global variables to pass back the results. Again, here I need another
look later. I confused either myself or the compiler on these.

cube, seg, and jsonpath are about as done as I would want them to be.

Not done yet:
* src/backend/utils/misc/guc-file.l
* src/pl/plpgsql/src/pl_gram.y

These have quite different structures and requirements, so I plan to
deal with them separately.

Not relevant for backend thread-safety:
* src/backend/bootstrap/

It might make sense to eventually covert that one as well, just so that
the APIs are kept similar. But that could be for later.

Note that the core scanner and parser are already reentrant+pure.

Also, there are various other scanners and parsers in frontends (psql,
pgbench, ecpg) that are not relevant for this. (Again, it might make
sense to convert some of them later, and some of them are already done.)

AFAICT, all the options and coding techniques used here are already in
use elsewhere in the tree, so there shouldn't be any concerns about
bison or flex version compatibility.

Attachments:

v0-0004-seg-pure-parser-and-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v0-0004-seg-pure-parser-and-reentrant-scanner.patchDownload+41-38
v0-0001-cube-pure-parser-and-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v0-0001-cube-pure-parser-and-reentrant-scanner.patchDownload+45-44
v0-0002-cube-Use-palloc-instead-of-malloc-for-flex.patchtext/plain; charset=UTF-8; name=v0-0002-cube-Use-palloc-instead-of-malloc-for-flex.patchDownload+30-1
v0-0003-cube-Simplify-flex-scan-buffer-management.patchtext/plain; charset=UTF-8; name=v0-0003-cube-Simplify-flex-scan-buffer-management.patchDownload+1-12
v0-0005-seg-Use-palloc-instead-of-malloc-for-flex.patchtext/plain; charset=UTF-8; name=v0-0005-seg-Use-palloc-instead-of-malloc-for-flex.patchDownload+30-1
v0-0006-seg-Simplify-flex-scan-buffer-management.patchtext/plain; charset=UTF-8; name=v0-0006-seg-Simplify-flex-scan-buffer-management.patchDownload+1-13
v0-0007-replication-parser-pure-parser-and-reentrant-scan.patchtext/plain; charset=UTF-8; name=v0-0007-replication-parser-pure-parser-and-reentrant-scan.patchDownload+47-46
v0-0008-replication-parser-Use-palloc-instead-of-malloc-f.patchtext/plain; charset=UTF-8; name=v0-0008-replication-parser-Use-palloc-instead-of-malloc-f.patchDownload+30-1
v0-0009-replication-parser-Simplify-flex-scan-buffer-mana.patchtext/plain; charset=UTF-8; name=v0-0009-replication-parser-Simplify-flex-scan-buffer-mana.patchDownload+1-10
v0-0010-syncrep-parser-pure-parser-and-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v0-0010-syncrep-parser-pure-parser-and-reentrant-scanner.patchDownload+38-37
v0-0011-syncrep-parser-Use-palloc-instead-of-malloc-for-f.patchtext/plain; charset=UTF-8; name=v0-0011-syncrep-parser-Use-palloc-instead-of-malloc-for-f.patchDownload+30-1
v0-0012-syncrep-parser-Simplify-flex-scan-buffer-manageme.patchtext/plain; charset=UTF-8; name=v0-0012-syncrep-parser-Simplify-flex-scan-buffer-manageme.patchDownload+1-10
v0-0013-syncrep-parser-Use-flex-yyextra.patchtext/plain; charset=UTF-8; name=v0-0013-syncrep-parser-Use-flex-yyextra.patchDownload+13-7
v0-0014-jsonpath-scanner-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v0-0014-jsonpath-scanner-reentrant-scanner.patchDownload+50-76
v0-0015-jsonpath-scanner-Use-flex-yyextra.patchtext/plain; charset=UTF-8; name=v0-0015-jsonpath-scanner-Use-flex-yyextra.patchDownload+92-80
#2Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#1)
Re: pure parsers and reentrant scanners

On 02.12.24 10:46, Peter Eisentraut wrote:

This patch series changes several parsers in the backend and contrib
modules to use bison pure parsers and flex reentrant scanners. This is
ultimately toward thread-safety, but I think it's also just nicer in
general, and it might also fix a few possible small memory leaks.

I did a bit more work on this, so here is an updated patch set.

Not done yet:
* src/backend/utils/misc/guc-file.l
* src/pl/plpgsql/src/pl_gram.y

I converted both of these scanners to reentrant, but I haven't done the
plpgsql parser yet.

Not relevant for backend thread-safety:
* src/backend/bootstrap/

It might make sense to eventually covert that one as well, just so that
the APIs are kept similar. But that could be for later.

I have done this one.

I'll leave it at this for now and wait for some reviews.

Attachments:

v1-0001-cube-pure-parser-and-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v1-0001-cube-pure-parser-and-reentrant-scanner.patchDownload+45-44
v1-0002-cube-Use-palloc-instead-of-malloc-for-flex.patchtext/plain; charset=UTF-8; name=v1-0002-cube-Use-palloc-instead-of-malloc-for-flex.patchDownload+30-1
v1-0003-cube-Simplify-flex-scan-buffer-management.patchtext/plain; charset=UTF-8; name=v1-0003-cube-Simplify-flex-scan-buffer-management.patchDownload+1-12
v1-0004-seg-pure-parser-and-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v1-0004-seg-pure-parser-and-reentrant-scanner.patchDownload+41-38
v1-0018-guc-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v1-0018-guc-reentrant-scanner.patchDownload+19-12
v1-0019-plpgsql-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v1-0019-plpgsql-reentrant-scanner.patchDownload+256-237
v1-0005-seg-Use-palloc-instead-of-malloc-for-flex.patchtext/plain; charset=UTF-8; name=v1-0005-seg-Use-palloc-instead-of-malloc-for-flex.patchDownload+30-1
v1-0006-seg-Simplify-flex-scan-buffer-management.patchtext/plain; charset=UTF-8; name=v1-0006-seg-Simplify-flex-scan-buffer-management.patchDownload+1-13
v1-0007-replication-parser-pure-parser-and-reentrant-scan.patchtext/plain; charset=UTF-8; name=v1-0007-replication-parser-pure-parser-and-reentrant-scan.patchDownload+47-46
v1-0008-replication-parser-Use-palloc-instead-of-malloc-f.patchtext/plain; charset=UTF-8; name=v1-0008-replication-parser-Use-palloc-instead-of-malloc-f.patchDownload+30-1
v1-0009-replication-parser-Simplify-flex-scan-buffer-mana.patchtext/plain; charset=UTF-8; name=v1-0009-replication-parser-Simplify-flex-scan-buffer-mana.patchDownload+1-10
v1-0010-syncrep-parser-pure-parser-and-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v1-0010-syncrep-parser-pure-parser-and-reentrant-scanner.patchDownload+38-37
v1-0011-syncrep-parser-Use-palloc-instead-of-malloc-for-f.patchtext/plain; charset=UTF-8; name=v1-0011-syncrep-parser-Use-palloc-instead-of-malloc-for-f.patchDownload+30-1
v1-0012-syncrep-parser-Simplify-flex-scan-buffer-manageme.patchtext/plain; charset=UTF-8; name=v1-0012-syncrep-parser-Simplify-flex-scan-buffer-manageme.patchDownload+1-10
v1-0013-syncrep-parser-Use-flex-yyextra.patchtext/plain; charset=UTF-8; name=v1-0013-syncrep-parser-Use-flex-yyextra.patchDownload+13-7
v1-0014-jsonpath-scanner-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v1-0014-jsonpath-scanner-reentrant-scanner.patchDownload+50-76
v1-0015-jsonpath-scanner-Use-flex-yyextra.patchtext/plain; charset=UTF-8; name=v1-0015-jsonpath-scanner-Use-flex-yyextra.patchDownload+92-80
v1-0016-bootstrap-pure-parser-and-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v1-0016-bootstrap-pure-parser-and-reentrant-scanner.patchDownload+47-39
v1-0017-bootstrap-Use-palloc-instead-of-malloc-for-flex.patchtext/plain; charset=UTF-8; name=v1-0017-bootstrap-Use-palloc-instead-of-malloc-for-flex.patchDownload+30-1
#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Eisentraut (#2)
Re: pure parsers and reentrant scanners

On 16/12/2024 09:39, Peter Eisentraut wrote:

On 02.12.24 10:46, Peter Eisentraut wrote:

This patch series changes several parsers in the backend and contrib
modules to use bison pure parsers and flex reentrant scanners. This is
ultimately toward thread-safety, but I think it's also just nicer in
general, and it might also fix a few possible small memory leaks.

I did a bit more work on this, so here is an updated patch set.

Looks good to me. There's more work to be done, but this is all good
steps in the right direction.

--
Heikki Linnakangas
Neon (https://neon.tech)

#4Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Peter Eisentraut (#2)
Re: pure parsers and reentrant scanners

On 12/16/24 8:39 AM, Peter Eisentraut wrote:

I'll leave it at this for now and wait for some reviews.

I really like this work since it makes the code cleaner to read on top
of paving the way for threading.

Reviewed the patches and found a couple of issues.

- Shouldn't yyext in syncrep_scanner_init() be allocated on the heap? Or
at least on the stack but by the caller?

- I think you have flipped the parameters of replication_yyerror(), see
attached fixup patch.

- Some white space issues fixed in an attached fixup patch.

- Also fixed the static remaining variables in the replication parser in
an attached patch.

- There seems to be a lot left to do to make the plpgsql scanner
actually re-entrant so I do not think it would makes sense to commit the
patch which sets the re-entrant option before that is done.

Andreas

Attachments:

v2-0020-fixup-replication-parser-pure-parser-and-reentran.patchtext/x-patch; charset=UTF-8; name=v2-0020-fixup-replication-parser-pure-parser-and-reentran.patchDownload+2-3
v2-0021-fixup-replication-parser-pure-parser-and-reentran.patchtext/x-patch; charset=UTF-8; name=v2-0021-fixup-replication-parser-pure-parser-and-reentran.patchDownload+1-3
v2-0022-replication-parser-Use-flex-yyextra.patchtext/x-patch; charset=UTF-8; name=v2-0022-replication-parser-Use-flex-yyextra.patchDownload+45-31
#5Peter Eisentraut
peter_e@gmx.net
In reply to: Andreas Karlsson (#4)
Re: pure parsers and reentrant scanners

I started committing the cube and seg pieces. There were a couple of
complaints from the buildfarm, like

ccache clang -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla
-Werror=unguarded-availability-new -Wendif-labels
-Wmissing-format-attribute -Wcast-function-type -Wformat-security
-Wmissing-variable-declarations -fno-strict-aliasing -fwrapv
-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro -g
-O2 -fno-common -fsanitize=alignment -fsanitize-trap=alignment
-Wno-deprecated-declarations -Werror -fvisibility=hidden -I. -I.
-I../../src/include -isysroot
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk
-DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
-I/usr/local/include -I/usr/local/ssl/include -c -o segscan.o segscan.c
segscan.c:348:15: error: redefinition of typedef 'yyscan_t' is a C11
feature [-Werror,-Wtypedef-redefinition]
typedef void* yyscan_t;
^
./segdata.h:19:15: note: previous definition is here
typedef void *yyscan_t;
^

I can fix that with the attached patch.

The symbol YY_TYPEDEF_YY_SCANNER_T isn't documented, but we already use
it elsewhere in the code.

Note that in replication/syncrep.h and replication/walsender_private.h
we have to have an #ifndef wrapper because there are files that end up
including both headers. Maybe we should put that #ifndef wrapper
everywhere for consistency?

Any thoughts?

(Also, we should probably figure out a way to get these warnings before
things hit the buildfarm.)

Attachments:

typedef-yyscan_t-fix.patch.nocfbottext/plain; charset=UTF-8; name=typedef-yyscan_t-fix.patch.nocfbotDownload+10-0
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#5)
Re: pure parsers and reentrant scanners

Peter Eisentraut <peter@eisentraut.org> writes:

I started committing the cube and seg pieces. There were a couple of
complaints from the buildfarm, like
segscan.c:348:15: error: redefinition of typedef 'yyscan_t' is a C11
feature [-Werror,-Wtypedef-redefinition]
typedef void* yyscan_t;
...
(Also, we should probably figure out a way to get these warnings before
things hit the buildfarm.)

Interestingly, while sifaka shows that, its sibling indri doesn't.
Same compiler, same CFLAGS. I think the relevant difference must
be that sifaka is using a much older Bison version (the Apple-supplied
2.3, versus MacPorts' up-to-the-minute version). I think that sort of
thing is exactly why we have the buildfarm. It would not be
reasonable to expect CI to cover that many cases. Trying to do so
would just make CI slow enough that we'd start looking for a new test
phase to put in front of it.

regards, tom lane

#7Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Peter Eisentraut (#5)
Re: pure parsers and reentrant scanners

On 12/18/24 10:42 AM, Peter Eisentraut wrote:

I can fix that with the attached patch.

The symbol YY_TYPEDEF_YY_SCANNER_T isn't documented, but we already use
it elsewhere in the code.

Note that in replication/syncrep.h and replication/walsender_private.h
we have to have an #ifndef wrapper because there are files that end up
including both headers.  Maybe we should put that #ifndef wrapper
everywhere for consistency?

Any thoughts?

Seems like a sane fix to me and as for the ifndef I have no strong
opinion either way but I would personally probably have added it
everywhere for consistency.

Andreas

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Andreas Karlsson (#4)
Re: pure parsers and reentrant scanners

On 17.12.24 01:46, Andreas Karlsson wrote:

On 12/16/24 8:39 AM, Peter Eisentraut wrote:

I'll leave it at this for now and wait for some reviews.

I really like this work since it makes the code cleaner to read on top
of paving the way for threading.

Reviewed the patches and found a couple of issues.

- Shouldn't yyext in syncrep_scanner_init() be allocated on the heap? Or
at least on the stack but by the caller?

I think it's correct the way it is. It's only a temporary space for the
scanner, so we can allocate it in the innermost scope.

- I think you have flipped the parameters of replication_yyerror(), see
attached fixup patch.

Good catch. There was also a similar issue with syncrep_yyerror().

- Some white space issues fixed in an attached fixup patch.

committed

- Also fixed the static remaining variables in the replication parser in
an attached patch.

Thanks, I'll take a look at that.

- There seems to be a lot left to do to make the plpgsql scanner
actually re-entrant so I do not think it would makes sense to commit the
patch which sets the re-entrant option before that is done.

Ok, we can hold that one back until the full stack including the parser
is done.

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#6)
Re: pure parsers and reentrant scanners

On 18.12.24 18:43, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

I started committing the cube and seg pieces. There were a couple of
complaints from the buildfarm, like
segscan.c:348:15: error: redefinition of typedef 'yyscan_t' is a C11
feature [-Werror,-Wtypedef-redefinition]
typedef void* yyscan_t;
...
(Also, we should probably figure out a way to get these warnings before
things hit the buildfarm.)

Interestingly, while sifaka shows that, its sibling indri doesn't.
Same compiler, same CFLAGS. I think the relevant difference must
be that sifaka is using a much older Bison version (the Apple-supplied
2.3, versus MacPorts' up-to-the-minute version). I think that sort of
thing is exactly why we have the buildfarm. It would not be
reasonable to expect CI to cover that many cases. Trying to do so
would just make CI slow enough that we'd start looking for a new test
phase to put in front of it.

The situation is that most current compilers default to some newer C
standard version. And so they won't complain about use of C11 features.
But the affected buildfarm members for whatever reason run with
CC='clang -std=gnu99', and so they correctly reject C11 features. We
could do something similar in the Cirrus configuration. I'll start a
separate thread about that.

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#8)
Re: pure parsers and reentrant scanners

On 19.12.24 13:48, Peter Eisentraut wrote:

On 17.12.24 01:46, Andreas Karlsson wrote:

On 12/16/24 8:39 AM, Peter Eisentraut wrote:

I'll leave it at this for now and wait for some reviews.

I really like this work since it makes the code cleaner to read on top
of paving the way for threading.

Reviewed the patches and found a couple of issues.

- Shouldn't yyext in syncrep_scanner_init() be allocated on the heap?
Or at least on the stack but by the caller?

I think it's correct the way it is.  It's only a temporary space for the
scanner, so we can allocate it in the innermost scope.

- Also fixed the static remaining variables in the replication parser
in an attached patch.

Thanks, I'll take a look at that.

I see what was going on here. I was allocating yyext as a local
variable in the init function and then it would go out of scope while
the scanner is still in use. That's why this didn't work for me. I had
written essentially the same patch as you for the replication scanner
yyextra but with a local variable, and it was "mysteriously" failing the
tests for me. Your solution is better. (For the jsonpath scanner, the
local variable works because the scanner init and shutdown are called
from the same function.)

Here is an updated patch set on top of what has been committed so far,
with all the issues you pointed out addressed.

Attachments:

v2-0001-replication-parser-pure-parser-and-reentrant-scan.patchtext/plain; charset=UTF-8; name=v2-0001-replication-parser-pure-parser-and-reentrant-scan.patchDownload+50-46
v2-0002-replication-parser-Use-palloc-instead-of-malloc-f.patchtext/plain; charset=UTF-8; name=v2-0002-replication-parser-Use-palloc-instead-of-malloc-f.patchDownload+30-1
v2-0003-replication-parser-Simplify-flex-scan-buffer-mana.patchtext/plain; charset=UTF-8; name=v2-0003-replication-parser-Simplify-flex-scan-buffer-mana.patchDownload+1-10
v2-0004-replication-parser-Use-flex-yyextra.patchtext/plain; charset=UTF-8; name=v2-0004-replication-parser-Use-flex-yyextra.patchDownload+40-31
v2-0005-syncrep-parser-pure-parser-and-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v2-0005-syncrep-parser-pure-parser-and-reentrant-scanner.patchDownload+41-37
v2-0006-syncrep-parser-Use-palloc-instead-of-malloc-for-f.patchtext/plain; charset=UTF-8; name=v2-0006-syncrep-parser-Use-palloc-instead-of-malloc-for-f.patchDownload+30-1
v2-0007-syncrep-parser-Simplify-flex-scan-buffer-manageme.patchtext/plain; charset=UTF-8; name=v2-0007-syncrep-parser-Simplify-flex-scan-buffer-manageme.patchDownload+1-10
v2-0008-syncrep-parser-Use-flex-yyextra.patchtext/plain; charset=UTF-8; name=v2-0008-syncrep-parser-Use-flex-yyextra.patchDownload+18-7
v2-0009-jsonpath-scanner-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v2-0009-jsonpath-scanner-reentrant-scanner.patchDownload+144-154
v2-0010-guc-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v2-0010-guc-reentrant-scanner.patchDownload+19-12
v2-0011-plpgsql-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v2-0011-plpgsql-reentrant-scanner.patchDownload+256-237
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#10)
Re: pure parsers and reentrant scanners

I noticed that lapwing is bleating about

ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -fPIC -fvisibility=hidden -I. -I. -I../../src/include -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o cubescan.o cubescan.c
cubescan.c:1689:5: warning: no previous prototype for 'cube_yyget_column' [-Wmissing-prototypes]
cubescan.c:1765:6: warning: no previous prototype for 'cube_yyset_column' [-Wmissing-prototypes]

and likewise in segscan.c. lapwing is using flex 2.5.35, so probably
this is the same bug worked around in parser/scan.l:

/*
* Work around a bug in flex 2.5.35: it emits a couple of functions that
* it forgets to emit declarations for. Since we use -Wmissing-prototypes,
* this would cause warnings. Providing our own declarations should be
* harmless even when the bug gets fixed.
*/
extern int core_yyget_column(yyscan_t yyscanner);
extern void core_yyset_column(int column_no, yyscan_t yyscanner);

regards, tom lane

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#11)
Re: pure parsers and reentrant scanners

On 20.12.24 02:07, Tom Lane wrote:

I noticed that lapwing is bleating about

ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -fPIC -fvisibility=hidden -I. -I. -I../../src/include -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o cubescan.o cubescan.c
cubescan.c:1689:5: warning: no previous prototype for 'cube_yyget_column' [-Wmissing-prototypes]
cubescan.c:1765:6: warning: no previous prototype for 'cube_yyset_column' [-Wmissing-prototypes]

and likewise in segscan.c. lapwing is using flex 2.5.35, so probably
this is the same bug worked around in parser/scan.l:

Ok, we can fix that, but maybe this is also a good moment to think about
whether that is useful. I could not reproduce the issue with flex
2.5.39. I could find no download of flex 2.5.35. The github site only
offers back to 2.5.39, the sourceforce site back to 2.5.36. lapwing
says it's Debian 7.0, which went out of support in 2016 and out of
super-duper-extended support in 2020. It also doesn't have a supported
OpenSSL version anymore, and IIRC, it has a weird old compiler that
occasionally gives bogus warnings. I think it's time to stop supporting
this.

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#12)
Re: pure parsers and reentrant scanners

Peter Eisentraut <peter@eisentraut.org> writes:

On 20.12.24 02:07, Tom Lane wrote:

I noticed that lapwing is bleating about
cubescan.c:1689:5: warning: no previous prototype for 'cube_yyget_column' [-Wmissing-prototypes]
cubescan.c:1765:6: warning: no previous prototype for 'cube_yyset_column' [-Wmissing-prototypes]
and likewise in segscan.c. lapwing is using flex 2.5.35, so probably
this is the same bug worked around in parser/scan.l:

Ok, we can fix that, but maybe this is also a good moment to think about
whether that is useful. I could not reproduce the issue with flex
2.5.39. I could find no download of flex 2.5.35. The github site only
offers back to 2.5.39, the sourceforce site back to 2.5.36. lapwing
says it's Debian 7.0, which went out of support in 2016 and out of
super-duper-extended support in 2020. It also doesn't have a supported
OpenSSL version anymore, and IIRC, it has a weird old compiler that
occasionally gives bogus warnings. I think it's time to stop supporting
this.

OK, that's fair. I do see lapwing called out a lot in the commit log,
though it's not clear how much of that is about 32-bitness and how
much about old tools. It's surely still valuable to have i386
machines in the buildfarm, but I agree that supporting unobtainable
tool versions is a bit much. Could we get that animal updated to
some newer OS version?

Presumably, we should also rip out the existing yyget_column and
yyset_column kluges in

src/backend/parser/scan.l: extern int core_yyget_column(yyscan_t yyscanner);
src/bin/psql/psqlscanslash.l: extern int slash_yyget_column(yyscan_t yyscanner);
src/bin/pgbench/exprscan.l: extern int expr_yyget_column(yyscan_t yyscanner);
src/fe_utils/psqlscan.l: extern int psql_yyget_column(yyscan_t yyscanner);

regards, tom lane

#14Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#13)
Re: pure parsers and reentrant scanners

On Fri, Dec 20, 2024 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Could we get that animal updated to
some newer OS version?

There is already adder animal that is running debian sid on i386. The
only remaining interest in lapwing is to have older versions of
everything, so if that's useless I can just trash that vm.

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#14)
Re: pure parsers and reentrant scanners

Julien Rouhaud <rjuju123@gmail.com> writes:

On Fri, Dec 20, 2024 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Could we get that animal updated to
some newer OS version?

There is already adder animal that is running debian sid on i386. The
only remaining interest in lapwing is to have older versions of
everything, so if that's useless I can just trash that vm.

Hmm, sid is the opposite extreme no? Maybe switching lapwing to
whatever is currently the oldest supported Debian release would
be a good answer.

regards, tom lane

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#15)
Re: pure parsers and reentrant scanners

On 20.12.24 16:35, Tom Lane wrote:

Julien Rouhaud <rjuju123@gmail.com> writes:

On Fri, Dec 20, 2024 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Could we get that animal updated to
some newer OS version?

There is already adder animal that is running debian sid on i386. The
only remaining interest in lapwing is to have older versions of
everything, so if that's useless I can just trash that vm.

Hmm, sid is the opposite extreme no? Maybe switching lapwing to
whatever is currently the oldest supported Debian release would
be a good answer.

Yeah, Debian stable or oldstable on i386 could certainly be useful.

#17Andreas Karlsson
andreas.karlsson@percona.com
In reply to: Peter Eisentraut (#10)
Re: pure parsers and reentrant scanners

On 12/19/24 9:57 PM, Peter Eisentraut wrote:

Here is an updated patch set on top of what has been committed so far,
with all the issues you pointed out addressed.

Other than the discussion of how old versions of flex we should support
I think this set of patches is ready to be committed. I looked at it
again and everything looks good.

Andreas

#18Peter Eisentraut
peter_e@gmx.net
In reply to: Andreas Karlsson (#17)
Re: pure parsers and reentrant scanners

On 22.12.24 22:43, Andreas Karlsson wrote:

On 12/19/24 9:57 PM, Peter Eisentraut wrote:

Here is an updated patch set on top of what has been committed so far,
with all the issues you pointed out addressed.

Other than the discussion of how old versions of flex we should support
I think this set of patches is ready to be committed. I looked at it
again and everything looks good.

I have committed these except the plpgsql one, which was still work in
progress. But I have progressed on this now and also converted the
parser and put the local state into yyextra. This gets rid of all
internal global state now. The patches for this are attached. It's a
lot of churn, but otherwise pretty standard stuff.

Along the way I noticed that the flex documentation now recommends a
different way to set the yyextra type. So I have changed the ones we
already have to that newer style. I inspected the generated C code and
there wasn't any significant difference, so I'm not sure, but I figure
if we're making changes in this area we might as well use the modern style.

Attachments:

v3-0001-flex-code-modernization-Replace-YY_EXTRA_TYPE-def.patchtext/plain; charset=UTF-8; name=v3-0001-flex-code-modernization-Replace-YY_EXTRA_TYPE-def.patchDownload+16-22
v3-0002-plpgsql-reentrant-scanner.patchtext/plain; charset=UTF-8; name=v3-0002-plpgsql-reentrant-scanner.patchDownload+258-236
v3-0003-plpgsql-pure-parser.patchtext/plain; charset=UTF-8; name=v3-0003-plpgsql-pure-parser.patchDownload+318-315
v3-0004-plpgsql-yyextra.patchtext/plain; charset=UTF-8; name=v3-0004-plpgsql-yyextra.patchDownload+250-220
#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Eisentraut (#18)
Re: pure parsers and reentrant scanners

On 26/12/2024 20:27, Peter Eisentraut wrote:

On 22.12.24 22:43, Andreas Karlsson wrote:

On 12/19/24 9:57 PM, Peter Eisentraut wrote:

Here is an updated patch set on top of what has been committed so
far, with all the issues you pointed out addressed.

Other than the discussion of how old versions of flex we should
support I think this set of patches is ready to be committed. I looked
at it again and everything looks good.

I have committed these except the plpgsql one, which was still work in
progress.  But I have progressed on this now and also converted the
parser and put the local state into yyextra.  This gets rid of all
internal global state now.  The patches for this are attached.  It's a
lot of churn, but otherwise pretty standard stuff.

Looks good to me.

Along the way I noticed that the flex documentation now recommends a
different way to set the yyextra type.  So I have changed the ones we
already have to that newer style.  I inspected the generated C code and
there wasn't any significant difference, so I'm not sure, but I figure
if we're making changes in this area we might as well use the modern style.

+1. According to the flex NEWS file, this syntax was added in flex
2.5.34, and we already require 2.5.35.

--
Heikki Linnakangas
Neon (https://neon.tech)

#20Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#19)
Re: pure parsers and reentrant scanners

On 27.12.24 10:19, Heikki Linnakangas wrote:

On 26/12/2024 20:27, Peter Eisentraut wrote:

On 22.12.24 22:43, Andreas Karlsson wrote:

On 12/19/24 9:57 PM, Peter Eisentraut wrote:

Here is an updated patch set on top of what has been committed so
far, with all the issues you pointed out addressed.

Other than the discussion of how old versions of flex we should
support I think this set of patches is ready to be committed. I
looked at it again and everything looks good.

I have committed these except the plpgsql one, which was still work in
progress.  But I have progressed on this now and also converted the
parser and put the local state into yyextra.  This gets rid of all
internal global state now.  The patches for this are attached.  It's a
lot of churn, but otherwise pretty standard stuff.

Looks good to me.

Along the way I noticed that the flex documentation now recommends a
different way to set the yyextra type.  So I have changed the ones we
already have to that newer style.  I inspected the generated C code
and there wasn't any significant difference, so I'm not sure, but I
figure if we're making changes in this area we might as well use the
modern style.

+1. According to the flex NEWS file, this syntax was added in flex
2.5.34, and we already require 2.5.35.

These have been committed. This concludes the main body of this work.

I'll let it take a lap around the buildfarm and will follow up in a few
days on what if anything lapwing has to say about it in terms of
warnings and what we want to do about it.

#21Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#13)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#21)
#23Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#22)
#24Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#18)
#25Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#24)