Hook for extensible parsing.

Started by Julien Rouhaudalmost 5 years ago33 messageshackers
Jump to latest
#1Julien Rouhaud
rjuju123@gmail.com

Hi,

Being able to extend core parser has been requested multiple times, and AFAICT
all previous attempts were rejected not because this isn't wanted but because
the proposed implementations required plugins to reimplement all of the core
grammar with their own changes, as bison generated parsers aren't extensible.

I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers. I'm sending this now as a follow-up of [1]/messages/by-id/20210315164336.ak32whndsxna5mjf@nol and to avoid duplicated
efforts, as multiple people are interested in that topic.

Obviously, since this is only about parsing, all modules can only implement
some kind of syntactic sugar, as they have to produce valid parsetrees, but
this could be a first step to later allow custom nodes and let plugins
implement e.g. new UTILITY commands.

So, this approach should allow different custom parser implementations:

1 implement only a few new commands on top of core grammar. For instance, an
extension could add support for CREATE [PHYSICAL | LOGICAL] REPLICATION SLOT
and rewrite that to a SelectStmt on top of the extisting function, or add a
CREATE HYPOTHETICAL INDEX, which would internally add a new option in
IndexStmt->options, to be intercepted in processUtility and bypass its
execution with the extension approach instead.

2 implement a totally different grammar for a different language. In case of
error, just silently fallback to core parser (or another hook) so both
parsers can still be used. Any language could be parsed as long as you can
produce a valid postgres parsetree.

3 implement a superuser of core grammar and replace core parser entirely. This
could arguably be done like the 1st case, but the idea is to avoid to
possibly parse the same input string twice, or to forbid the core parser if
that's somehow wanted.

I'm attaching some POC patches that implement this approach to start a
discussion. I split the infrastructure part in 2 patches to make it easier to
review, and I'm also adding 2 other patches with a small parser implementation
to be able to test the infrastructure. Here are some more details on the
patches and implementation details:

0001 simply adds a parser hook, which is called instead of raw_parser. This is
enough to make multiple parser coexist with one exception: multi-statement
query string. If multiple statements are provided, then all of them will be
parsed using the same grammar, which obviously won't work if they are written
for different grammars.

0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
ability to produce "select [col, ] col FROM table" parsetree, for testing
purpose. I chose it to ensure that everything works properly even with a
totally different grammar that has different keywords, which doesn't even ends
statements with a semicolon but a plain keyword.

0003 is where the real modifications are done to allow multi-statement string
to be parsed using different grammar. It implements a new MODE_SINGLE_QUERY
mode, which is used when a parser_hook is present. In that case,
pg_parse_query() will only parse part of the query string and loop until
everything is parsed (or some error happens).

pg_parse_query() will instruct plugins to parse a query at a time. They're
free to ignore that mode if they want to implement the 3rd mode. If so, they
should either return multiple RawStmt, a single RawStmt with a 0 or
strlen(query_string) stmt_len, or error out. Otherwise, they will implement
either mode 1 or 2, and they should always return a List containing a single
RawStmt with properly set stmt_len, even if the underlying statement is NULL.
This is required to properly skip valid strings that don't contain a
statements, and pg_parse_query() will skip RawStmt that don't contain an
underlying statement.

It also teaches the core parser to do the same, by optionally start parsing
somewhere in the input string and stop parsing once a valid statement is found.

Note that the whole input string is provided to the parsers in order to report
correct cursor position, so all token can get a correct location. This means
that raw_parser() signature needs an additional offset to know where the
parsing should start.

Finally, 0004 modifies the sqlol parser to implement the MODE_SINGLE_QUERY
mode, adds grammar for creating views and adds some regression test to validate
proper parsing and error location reporting with multi-statements input string.

As far as I can tell it's all working as expected but I may have missed some
usecases. The regression tests still work with the additional parser
configured. The only difference is for pg_stat_statements, as in
MODE_SINGLE_QUERY the trailing semicolon has to be included in the statement,
since other grammars may understand semicolons differently.

The obvious drawback is that it can cause overhead as the same input can be
parsed multiple time. This could be avoided with plugins implementing a GUC to
enable/disable their parser, so it's only active by default for some
users/database, or requires to be enabled interactively by the client app.

Also, the error messages can also be unhelpful for cases 1 and 2. If the
custom parser doesn't error out, it means that the syntax errors will be raised
by the core parser based on the core grammar, which will likely point out an
unrelated problem. Some of that can be avoided by letting the custom parsers
raise errors when they know for sure it's parsing what it's supposed to parse
(there's an example of that in the sqlol parser for qualified_name parsing, as
it can only happen once some specific keywords already matched). For the rest
of the errors, the only option I can think of is another GUC to let custom
parsers always raise an error (or raise a warning) to help people debug their
queries.

I'll park this patch in the next commitfest so it can be discussed when pg15
development starts.

[1]: /messages/by-id/20210315164336.ak32whndsxna5mjf@nol

Attachments:

v1-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload+19-3
v1-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload+1468-1
v1-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload+179-26
v1-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload+192-47
#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#1)
Re: Hook for extensible parsing.

On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:

I'm attaching some POC patches that implement this approach to start a
discussion.

I just noticed that the cfbot fails with the v1 patch. Attached v2 that should
fix that.

Attachments:

v2-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload+19-3
v2-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload+1468-1
v2-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload+210-28
v2-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload+192-47
#3Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#2)
Re: Hook for extensible parsing.

On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:

On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:

I'm attaching some POC patches that implement this approach to start a
discussion.

I just noticed that the cfbot fails with the v1 patch. Attached v2 that should
fix that.

The cfbot then revealed a missing dependency in the makefile to generate the
contrib parser, which triggers in make check-world without a previous
make -C contrib.

Thanks a lot to Thomas Munro for getting me the logfile from the failed cfbot
run and the fix!

Attachments:

v3-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload+19-3
v3-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload+1468-1
v3-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload+210-28
v3-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload+192-47
#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#3)
Re: Hook for extensible parsing.

On Tue, Jun 08, 2021 at 12:16:48PM +0800, Julien Rouhaud wrote:

On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:

On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:

I'm attaching some POC patches that implement this approach to start a
discussion.

The regression tests weren't stable, v4 fixes that.

Attachments:

v4-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload+19-3
v4-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload+1468-1
v4-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload+210-28
v4-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload+199-47
#5Jim Mlodgenski
jimmy76@gmail.com
In reply to: Julien Rouhaud (#4)
Re: Hook for extensible parsing.

On Sat, Jun 12, 2021 at 4:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers. I'm sending this now as a follow-up of [1] and to avoid duplicated
efforts, as multiple people are interested in that topic.

The patches all build properly and pass all regressions tests.

pg_parse_query() will instruct plugins to parse a query at a time. They're
free to ignore that mode if they want to implement the 3rd mode. If so, they
should either return multiple RawStmt, a single RawStmt with a 0 or
strlen(query_string) stmt_len, or error out. Otherwise, they will implement
either mode 1 or 2, and they should always return a List containing a single
RawStmt with properly set stmt_len, even if the underlying statement is NULL.
This is required to properly skip valid strings that don't contain a
statements, and pg_parse_query() will skip RawStmt that don't contain an
underlying statement.

Wouldn't we want to only loop through the individual statements if parser_hook
exists? The current patch seems to go through the new code path regardless
of the hook being grabbed.

#6Julien Rouhaud
rjuju123@gmail.com
In reply to: Jim Mlodgenski (#5)
Re: Hook for extensible parsing.

Thanks for the review Jim!

On Wed, Jul 7, 2021 at 3:26 AM Jim Mlodgenski <jimmy76@gmail.com> wrote:

On Sat, Jun 12, 2021 at 4:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

The patches all build properly and pass all regressions tests.

Note that the cfbot reports a compilation error on windows. That's on
the grammar extension part, so I'm really really interested in trying
to fix that for now, as it's mostly a quick POC to demonstrate how one
could implement a different grammar and validate that everything works
as expected.

Also, if this patch is eventually committed and having some code to
experience the hook is wanted it would probably be better to have a
very naive parser (based on a few strcmp() calls or something like
that) to validate the behavior rather than having a real parser.

pg_parse_query() will instruct plugins to parse a query at a time. They're
free to ignore that mode if they want to implement the 3rd mode. If so, they
should either return multiple RawStmt, a single RawStmt with a 0 or
strlen(query_string) stmt_len, or error out. Otherwise, they will implement
either mode 1 or 2, and they should always return a List containing a single
RawStmt with properly set stmt_len, even if the underlying statement is NULL.
This is required to properly skip valid strings that don't contain a
statements, and pg_parse_query() will skip RawStmt that don't contain an
underlying statement.

Wouldn't we want to only loop through the individual statements if parser_hook
exists? The current patch seems to go through the new code path regardless
of the hook being grabbed.

I did think about it, but I eventually chose to write it this way.
Having a different code path for the no-hook situation won't make the
with-hook code any easier (it should only remove some check for the
hook in some places that have 2 or 3 other checks already). On the
other hand, having a single code path avoid some (minimal) code
duplication, and also ensure that the main loop is actively tested
even without the hook being set. That's not 100% coverage, but it's
better than nothing. Performance wise, it shouldn't make any
noticeable difference for the no-hook case.

#7Jim Mlodgenski
jimmy76@gmail.com
In reply to: Julien Rouhaud (#6)
Re: Hook for extensible parsing.

On Wed, Jul 7, 2021 at 5:26 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Also, if this patch is eventually committed and having some code to
experience the hook is wanted it would probably be better to have a
very naive parser (based on a few strcmp() calls or something like
that) to validate the behavior rather than having a real parser.

The test module is very useful to show how to use the hook but it isn't
very useful to the general user like most other things in contrib. It probably
fits better in src/test/modules

#8Julien Rouhaud
rjuju123@gmail.com
In reply to: Jim Mlodgenski (#7)
Re: Hook for extensible parsing.

On Wed, Jul 7, 2021 at 8:45 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:

The test module is very useful to show how to use the hook but it isn't
very useful to the general user like most other things in contrib. It probably
fits better in src/test/modules

I agree that it's not useful at all to eventually have it as a
contrib, but it's somewhat convenient at this stage to be able to
easily test the hook, possibly with different behavior.

But as I said, if there's an agreement on the approach and the
implementation, I don't think that it would make sense to keep it even
in the src/test/modules. A full bison parser, even with a limited
grammar, will have about 99% of noise when it comes to demonstrate how
the hook is supposed to work, which basically is having a "single
query" parser or a "full input string" parser. I'm not even convinced
that flex/bison will be the preferred choice for someone who wants to
implement a custom parser.

I tried to add really thorough comments in the various parts of the
patch to make it clear how to do that and how the system will react
depending on what a hook does. I also added some protection to catch
inconsistent hook implementation. I think that's the best way to help
external parser authors to implement what they want, and I'll be happy
to improve the comments if necessary. But if eventually people would
like to have a real parser in the tree, for testing or guidance, I
will of course take care of doing the required changes and moving the
demo parser in src/test/modules.

#9vignesh C
vignesh21@gmail.com
In reply to: Julien Rouhaud (#4)
Re: Hook for extensible parsing.

On Sat, Jun 12, 2021 at 1:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Jun 08, 2021 at 12:16:48PM +0800, Julien Rouhaud wrote:

On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:

On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:

I'm attaching some POC patches that implement this approach to start a
discussion.

The regression tests weren't stable, v4 fixes that.

1) CFBOT showed the following compilation errors in windows:
"C:\projects\postgresql\pgsql.sln" (default target) (1) ->
"C:\projects\postgresql\sqlol.vcxproj" (default target) (69) ->
(ClCompile target) ->
c1 : fatal error C1083: Cannot open source file:
'contrib/sqlol/sqlol_gram.c': No such file or directory
[C:\projects\postgresql\sqlol.vcxproj]
c:\projects\postgresql\contrib\sqlol\sqlol_gramparse.h(25): fatal
error C1083: Cannot open include file: 'sqlol_gram.h': No such file or
directory (contrib/sqlol/sqlol.c)
[C:\projects\postgresql\sqlol.vcxproj]
c:\projects\postgresql\contrib\sqlol\sqlol_gramparse.h(25): fatal
error C1083: Cannot open include file: 'sqlol_gram.h': No such file or
directory (contrib/sqlol/sqlol_keywords.c)
[C:\projects\postgresql\sqlol.vcxproj]
c1 : fatal error C1083: Cannot open source file:
'contrib/sqlol/sqlol_scan.c': No such file or directory
[C:\projects\postgresql\sqlol.vcxproj]
0 Warning(s)
4 Error(s)
6123
6124Time Elapsed 00:05:40.23
6125

2) There was one small whitespace error with the patch:
git am v4-0002-Add-a-sqlol-parser.patch
Applying: Add a sqlol parser.
.git/rebase-apply/patch:818: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

Regards,
Vignesh

#10Julien Rouhaud
rjuju123@gmail.com
In reply to: vignesh C (#9)
Re: Hook for extensible parsing.

On Thu, Jul 22, 2021 at 12:01:34PM +0530, vignesh C wrote:

1) CFBOT showed the following compilation errors in windows:

Thanks for looking at it. I'm aware of this issue on windows, but as mentioned
in the thread the new contrib is there to demonstrate how the new
infrastructure works. If there were some interest in pushing the patch, I
don't think that we would add a full bison parser, whether it's in contrib or
test modules.

So unless there's a clear indication from a committer that we would want to
integrate such a parser, or if someone is interested in reviewing the patch and
only has a windows machine, I don't plan to spend time trying to fix a windows
only problem for something that will disappear anyway.

2) There was one small whitespace error with the patch:
git am v4-0002-Add-a-sqlol-parser.patch
Applying: Add a sqlol parser.
.git/rebase-apply/patch:818: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

Indeed, there's a trailing empty line in contrib/sqlol/sqlol_keywords.c. I
fixed it locally, but as I said this module is most certainly going to
disappear so I'm not sending an updating patch right now.

#11Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#10)
Re: Hook for extensible parsing.

On Thu, Jul 22, 2021 at 03:04:19PM +0800, Julien Rouhaud wrote:

On Thu, Jul 22, 2021 at 12:01:34PM +0530, vignesh C wrote:

1) CFBOT showed the following compilation errors in windows:

Thanks for looking at it. I'm aware of this issue on windows, but as mentioned
in the thread the new contrib is there to demonstrate how the new
infrastructure works. If there were some interest in pushing the patch, I
don't think that we would add a full bison parser, whether it's in contrib or
test modules.

So unless there's a clear indication from a committer that we would want to
integrate such a parser, or if someone is interested in reviewing the patch and
only has a windows machine, I don't plan to spend time trying to fix a windows
only problem for something that will disappear anyway.

I'm not sure what changed in the Windows part of the cfbot, but somehow it's
not hitting any compilation error anymore and all the tests are now green.

#12Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#11)
Re: Hook for extensible parsing.

v5 attached, fixing conflict with 639a86e36a (Remove Value node struct)

Attachments:

v5-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload+19-3
v5-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload+1466-1
v5-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload+210-28
v5-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload+199-47
#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Julien Rouhaud (#1)
Re: Hook for extensible parsing.

On Sat, 1 May 2021 at 08:24, Julien Rouhaud <rjuju123@gmail.com> wrote:

Being able to extend core parser has been requested multiple times, and AFAICT
all previous attempts were rejected not because this isn't wanted but because
the proposed implementations required plugins to reimplement all of the core
grammar with their own changes, as bison generated parsers aren't extensible.

I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers.

Yes, that approach has been discussed by many people, most recently
around the idea to create a fast-path grammar to make the most
frequently used SQL parse faster.

0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
ability to produce "select [col, ] col FROM table" parsetree, for testing
purpose. I chose it to ensure that everything works properly even with a
totally different grammar that has different keywords, which doesn't even ends
statements with a semicolon but a plain keyword.

The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.

What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.

Core doesn't need a LOL parser and I don't think we should commit that.

If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.

--
Simon Riggs http://www.EnterpriseDB.com/

#14Jim Mlodgenski
jimmy76@gmail.com
In reply to: Simon Riggs (#13)
Re: Hook for extensible parsing.

On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.

That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
openssl_tls_init_hook only has a usage in src/test/modules

What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.

Agreed. There should be something in src/test/modules to exercise this
but probably more to flush things out. Maybe extending adminpack to use
this so if enabled, it can use syntax like:
FILE READ 'foo.txt'

Core doesn't need a LOL parser and I don't think we should commit that.

+1

If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.

This hook allows extension developers to supplement syntax in addition
to adding a whole new language allowing the extension to appear more
native to the end user. For example, pglogical could use this to add
syntax to do a CREATE NODE instead of calling the function create_node.
Adding CREATE LANGUAGE support around this would just be for a narrow
set of use cases where a new language is added.

#15Julien Rouhaud
rjuju123@gmail.com
In reply to: Jim Mlodgenski (#14)
Re: Hook for extensible parsing.

On Wed, Sep 15, 2021 at 10:14 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:

On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.

That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
openssl_tls_init_hook only has a usage in src/test/modules

Yes, I also think that it's not a strict requirement that all hooks
have a caller in the core, even if it's obviously better if that's the
case.

What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.

Agreed. There should be something in src/test/modules to exercise this
but probably more to flush things out. Maybe extending adminpack to use
this so if enabled, it can use syntax like:
FILE READ 'foo.txt'

For this hook, maintaining a real alternative parser seems like way
too much trouble to justify an in-core user. The fact that many
people have asked for such a feature over the year should be enough to
justify the use case. We could try to invent some artificial need
like the one you suggest for adminpack, but it also feels like a waste
of resources.

As far as I'm concerned a naive strcmp-based parser in
src/test/modules should be enough to validate that the hook is
working, there's no need for more. In any case if the only
requirement for it to be committed is to write a real parser, whether
in contrib or src/test/modules, I'll be happy to do it.

Core doesn't need a LOL parser and I don't think we should commit that.

+1

I entirely agree, and I repeatedly mentioned in that thread that I did
*not* want to add this parser in core. The only purpose of patches
0002 and 0004 is to make the third-party bison based parser
requirements less abstract, and demonstrate that this approach can
successfully make two *radically different* parsers cohabit.

If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.

This hook allows extension developers to supplement syntax in addition
to adding a whole new language allowing the extension to appear more
native to the end user. For example, pglogical could use this to add
syntax to do a CREATE NODE instead of calling the function create_node.
Adding CREATE LANGUAGE support around this would just be for a narrow
set of use cases where a new language is added.

Yes, this hook can be used to implement multiple things as I mentioned
my initial email. Additionally, if this is eventually committed I'd
like to add support for CREATE HYPOTHETICAL INDEX grammar in hypopg.
Such a parser would only support one command (that extends an existing
one), so it can't really be called a language. Of course if would be
better to have the core parser accept a CREATE [ HYPOTHETICAL ] INDEX
and setup a flag so that third-parrty module can intercept this
utility command, but until that happens I could provide that syntactic
sugar for my users as long as I'm motivated enough to write this
parser.

Also, a hook based approach is still compatible with per database /
role configuration. It can be done either via specific
session_preload_libraries, or via a custom GUC if for some reason the
module requires to be in shared_preload_libraries.

#16David Fetter
david@fetter.org
In reply to: Simon Riggs (#13)
Re: Hook for extensible parsing.

On Wed, Sep 15, 2021 at 02:25:17PM +0100, Simon Riggs wrote:

On Sat, 1 May 2021 at 08:24, Julien Rouhaud <rjuju123@gmail.com> wrote:

Being able to extend core parser has been requested multiple times, and AFAICT
all previous attempts were rejected not because this isn't wanted but because
the proposed implementations required plugins to reimplement all of the core
grammar with their own changes, as bison generated parsers aren't extensible.

I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers.

Yes, that approach has been discussed by many people, most recently
around the idea to create a fast-path grammar to make the most
frequently used SQL parse faster.

0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
ability to produce "select [col, ] col FROM table" parsetree, for testing
purpose. I chose it to ensure that everything works properly even with a
totally different grammar that has different keywords, which doesn't even ends
statements with a semicolon but a plain keyword.

The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.

What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.

Core doesn't need a LOL parser and I don't think we should commit that.

It doesn't, but it very likely needs something people can use when
they create a new table AM, and that we should use the hook in core to
implement the heap* table AM to make sure the thing is working at DDL
time.

If we do this, I think it should have CREATE LANGUAGE support, so
that each plugin can be seen as an in-core object and allow security
around which users can execute which language types, allow us to
switch between languages and have default languages for specific
users or databases.

That's a great idea, but I must be missing something important as it
relates to parser hooks. Could you connect those a little more
explicitly?

Best,
David.

* It's not actually a heap in the sense that the term is normally used
in computing. I'd love to find out how it got to have this name and
document same so others aren't also left wondering.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#17Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#15)
Re: Hook for extensible parsing.

st 15. 9. 2021 v 16:55 odesílatel Julien Rouhaud <rjuju123@gmail.com>
napsal:

On Wed, Sep 15, 2021 at 10:14 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:

On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.

That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
openssl_tls_init_hook only has a usage in src/test/modules

Yes, I also think that it's not a strict requirement that all hooks
have a caller in the core, even if it's obviously better if that's the
case.

What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.

Agreed. There should be something in src/test/modules to exercise this
but probably more to flush things out. Maybe extending adminpack to use
this so if enabled, it can use syntax like:
FILE READ 'foo.txt'

For this hook, maintaining a real alternative parser seems like way
too much trouble to justify an in-core user. The fact that many
people have asked for such a feature over the year should be enough to
justify the use case. We could try to invent some artificial need
like the one you suggest for adminpack, but it also feels like a waste
of resources.

As far as I'm concerned a naive strcmp-based parser in
src/test/modules should be enough to validate that the hook is
working, there's no need for more. In any case if the only
requirement for it to be committed is to write a real parser, whether
in contrib or src/test/modules, I'll be happy to do it.

Core doesn't need a LOL parser and I don't think we should commit that.

+1

I entirely agree, and I repeatedly mentioned in that thread that I did
*not* want to add this parser in core. The only purpose of patches
0002 and 0004 is to make the third-party bison based parser
requirements less abstract, and demonstrate that this approach can
successfully make two *radically different* parsers cohabit.

If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.

This hook allows extension developers to supplement syntax in addition
to adding a whole new language allowing the extension to appear more
native to the end user. For example, pglogical could use this to add
syntax to do a CREATE NODE instead of calling the function create_node.
Adding CREATE LANGUAGE support around this would just be for a narrow
set of use cases where a new language is added.

Yes, this hook can be used to implement multiple things as I mentioned
my initial email. Additionally, if this is eventually committed I'd
like to add support for CREATE HYPOTHETICAL INDEX grammar in hypopg.
Such a parser would only support one command (that extends an existing
one), so it can't really be called a language. Of course if would be
better to have the core parser accept a CREATE [ HYPOTHETICAL ] INDEX
and setup a flag so that third-parrty module can intercept this
utility command, but until that happens I could provide that syntactic
sugar for my users as long as I'm motivated enough to write this
parser.

There were nice stream databases, but that ended because maintaining a fork
is too expensive. And without direct SQL (without possibility of parser
enhancing), the commands based on function call API were not readable and
workable flexible like SQL. Sometimes we really don't want to replace
PostgreSQL, but just enhance the main interface for extensions.

Also, a hook based approach is still compatible with per database /

Show quoted text

role configuration. It can be done either via specific
session_preload_libraries, or via a custom GUC if for some reason the
module requires to be in shared_preload_libraries.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Mlodgenski (#14)
Re: Hook for extensible parsing.

Jim Mlodgenski <jimmy76@gmail.com> writes:

On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:

The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.

That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook.

Yeah. I think the proper expectation is that there be a sufficiently
worked-out example to convince us that the proposed hooks have real-world
usefulness, and are not missing any basic requirements to make them do
something useful. Whether the example ends up in our tree is a
case-by-case decision.

In the case at hand, what's troubling me is that I don't see any
particular use in merely substituting a new bison grammar, if it
still has to produce parse trees that the rest of the system will
understand. Yeah, you could make some very simple surface-syntax
changes that way, but it doesn't seem like you could do anything
interesting (like, say, support Oracle-style outer join syntax).
AFAICS, to get to a useful feature, you'd then need to invent an
extensible Node system (which'd be hugely invasive if it's feasible
at all), and then probably more things on top of that. So I'm not
convinced that you've demonstrated any real-world usefulness.

regards, tom lane

#19Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#18)
Re: Hook for extensible parsing.

On Wed, Sep 15, 2021 at 11:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In the case at hand, what's troubling me is that I don't see any
particular use in merely substituting a new bison grammar, if it
still has to produce parse trees that the rest of the system will
understand. Yeah, you could make some very simple surface-syntax
changes that way, but it doesn't seem like you could do anything
interesting (like, say, support Oracle-style outer join syntax).
AFAICS, to get to a useful feature, you'd then need to invent an
extensible Node system (which'd be hugely invasive if it's feasible
at all), and then probably more things on top of that. So I'm not
convinced that you've demonstrated any real-world usefulness.

I agree that this patchset can only implement syntactic sugars,
nothing more (although for utility command you can do a bit more than
that). But that's already something people can use, mostly for
migration to postgres use cases probably.

I'm not sure why you couldn't implement an Oracle-style outer join
with such a hook? The requirement is that the parser can't leak any
node that the rest of the system doesn't know about, but you can do
what you want inside the parser. And as far as I can see we already
have an extensible node since bcac23de73b, so it seems to me that
there's enough infrastructure to handle this kind of use case.

The main downside is that you'll have to make a first pass to
transform your "custom raw statement" into a valid RawStmt in your
parser, and the system will do another one to transform it in a Query.
But apart from that it should work. Am I missing something?

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#19)
Re: Hook for extensible parsing.

Julien Rouhaud <rjuju123@gmail.com> writes:

I'm not sure why you couldn't implement an Oracle-style outer join
with such a hook?

Try it.

The requirement is that the parser can't leak any
node that the rest of the system doesn't know about, but you can do
what you want inside the parser.

That's not what the patch actually does, though. It only replaces
the grammar, not semantic analysis. So you couldn't associate the
(+)-decorated WHERE clause with the appropriate join. (And no,
I will not accept that it's okay to perform catalog lookups in
the grammar to get around that. See comment at the head of gram.y.)

In general, I'm having a hard time believing that anything very
interesting can be done at only the grammar level without changing
the parse analysis phase. That's not unrelated to the restriction
that the grammar can't do catalog accesses. Maybe with some fundamental
restructuring, we could get around that issue ... but this patch isn't
doing any fundamental restructuring, it's just putting a hook where it's
easy to do so. We've often found that such hooks aren't as useful as
they initially seem.

regards, tom lane

#21Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#20)
#22Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#20)
#24Jonah H. Harris
jonah.harris@gmail.com
In reply to: Andres Freund (#23)
#25Andres Freund
andres@anarazel.de
In reply to: Jonah H. Harris (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#23)
#27Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#26)
#28Julien Rouhaud
rjuju123@gmail.com
In reply to: Andres Freund (#27)
#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Julien Rouhaud (#28)
#30Julien Rouhaud
rjuju123@gmail.com
In reply to: Simon Riggs (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#30)
#32Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#31)
#33Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#32)