[PATCH] Add --syntax to postgres for SQL syntax checking

Started by Josef Šimánekabout 2 years ago35 messages
Jump to latest
#1Josef Šimánek
josef.simanek@gmail.com

Hello!

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

It is also heavily inspired by the "ruby -c" command, useful to check
syntax of Ruby programs without executing them.

For now, to keep it simple and to open discussion, I have added new
"--syntax" option into "postgres" command, since it is currently the
only one using needed parser dependency (at least per my
understanding). I tried to add this into psql or separate pg_syntax
commands, but parser is not exposed in "postgres_fe.h" and including
backend into those tools would not make most likely sense. Also syntax
could vary per backend, it makes sense to keep it in there.

It expects input on STDIN, prints out error if any and prints out
summary message (Valid SQL/Invalid SQL). On valid input it exits with
0 (success), otherwise it exits with 1 (error).

Example usage:

$ echo "SELECT 1" | src/backend/postgres --syntax
Valid SQL

$ echo "SELECT 1abc" | src/backend/postgres --syntax
ERROR: trailing junk after numeric literal at or near "1a" at character 8
Invalid SQL

$ cat ../src/test/regress/sql/alter_operator.sql | src/backend/postgres --syntax
Valid SQL

$ cat ../src/test/regress/sql/advisory_lock.sql | src/backend/postgres --syntax
ERROR: syntax error at or near "\" at character 99
Invalid SQL

This could be useful for manual script checks, automated script checks
and code editor integrations.

Notice it just parses the SQL, it doesn't detect any "runtime"
problems like unexisting table names, etc.

I have various ideas around this (like exposing similar functionality
directly in SQL using custom function like pg_check_syntax), but I
would like to get some feedback first.

What do you think?
enhnace
PS: I wasn't able to find any automated tests for "postgres" command
to enhance with, are there any?

PS2: Patch could be found at https://github.com/simi/postgres/pull/8 as well.

Attachments:

01-postgres-check.patchtext/x-patch; charset=US-ASCII; name=01-postgres-check.patchDownload+45-1
#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Josef Šimánek (#1)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

What do you think?

I like the idea. But what will happen if the SQL statement references
tables or other objects, since we have no database?

Yours,
Laurenz Albe

#3Josef Šimánek
josef.simanek@gmail.com
In reply to: Laurenz Albe (#2)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

Dne pá 15. 12. 2023 14:09 uživatel Laurenz Albe <laurenz.albe@cybertec.at>
napsal:

On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

What do you think?

I like the idea. But what will happen if the SQL statement references
tables or other objects, since we have no database?

It checks just the identifier is valid from parser perspective, like it is
valid table name.

Show quoted text

Yours,
Laurenz Albe

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Josef Šimánek (#3)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

Dne pá 15. 12. 2023 14:14 uživatel Josef Šimánek <josef.simanek@gmail.com>
napsal:

Dne pá 15. 12. 2023 14:09 uživatel Laurenz Albe <laurenz.albe@cybertec.at>
napsal:

On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

What do you think?

I like the idea. But what will happen if the SQL statement references
tables or other objects, since we have no database?

It checks just the identifier is valid from parser perspective, like it is
valid table name.

There can by two levels, like plpgsql or like pllgsql_check

Regards

Pavel

Show quoted text

Yours,
Laurenz Albe

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#2)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

Laurenz Albe <laurenz.albe@cybertec.at> writes:

On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

What do you think?

I like the idea. But what will happen if the SQL statement references
tables or other objects, since we have no database?

This seems like a fairly useless wart to me. What does it do that
you can't do better with existing facilities (psql etc)?

In the big picture a command line switch in the postgres executable
doesn't feel like the right place for this. There's no good reason
to assume that the server executable will be installed where you want
this capability; not to mention the possibility of version skew
between that executable and whatever installation you're actually
running on.

Another thing I don't like is that this exposes to the user what ought
to be purely an implementation detail, namely the division of labor
between gram.y (raw_parser()) and the rest of the parser. There are
checks that a user would probably see as "syntax checks" that don't
happen in gram.y, and conversely there are some things we reject there
that seem more like semantic than syntax issues.

regards, tom lane

#6Josef Šimánek
josef.simanek@gmail.com
In reply to: Tom Lane (#5)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

pá 15. 12. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

What do you think?

I like the idea. But what will happen if the SQL statement references
tables or other objects, since we have no database?

This seems like a fairly useless wart to me. What does it do that
you can't do better with existing facilities (psql etc)?

Per my experience, this is mostly useful during development to catch
syntax errors super early. For SELECT/INSERT/UPDATE/DELETE queries, it
is usually enough to prepend with EXPLAIN and run. But EXPLAIN doesn't
support all SQL like DDL statements. Let's say I have a long SQL
script I'm working on and there is typo in the middle like ALTERR
instead of ALTER. Is there any simple way to detect this without
actually running the statement in psql or other existing facilities?
This check could be simply combined with editor capabilities to be run
on each SQL file save to get quick feedback on this kind of mistakes
for great developer experience.

In the big picture a command line switch in the postgres executable
doesn't feel like the right place for this. There's no good reason
to assume that the server executable will be installed where you want
this capability; not to mention the possibility of version skew
between that executable and whatever installation you're actually
running on.

This is mostly intended for SQL developers and CI systems where
PostgreSQL backend is usually installed and in the actual version
needed. I agree postgres is not the best place (even it makes
partially sense to me), but as I mentioned, I wasn't able to craft a
quick patch with a better place to put this in. What would you
recommend? Separate executable like pg_syntaxcheck?

Another thing I don't like is that this exposes to the user what ought
to be purely an implementation detail, namely the division of labor
between gram.y (raw_parser()) and the rest of the parser. There are
checks that a user would probably see as "syntax checks" that don't
happen in gram.y, and conversely there are some things we reject there
that seem more like semantic than syntax issues.

I have no big insight into SQL parsing. Can you please share examples
of given concerns? Is there anything better than raw_parser() for this
purpose?

Show quoted text

regards, tom lane

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Josef Šimánek (#6)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

On Fri, Dec 15, 2023 at 8:05 AM Josef Šimánek <josef.simanek@gmail.com>
wrote:

pá 15. 12. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted

a

quick patch to do SQL syntax validation.

What do you think?

I like the idea. But what will happen if the SQL statement references
tables or other objects, since we have no database?

This seems like a fairly useless wart to me. What does it do that
you can't do better with existing facilities (psql etc)?

Per my experience, this is mostly useful during development to catch
syntax errors super early.

I'd suggest helping this project get production capable since it already is
trying to integrate with the development ecosystem you describe here.

https://github.com/supabase/postgres_lsp

I agree that developing this as a new executable for the purpose is needed
in order to best conform to existing conventions.

David J.

#8Josef Šimánek
josef.simanek@gmail.com
In reply to: David G. Johnston (#7)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

pá 15. 12. 2023 v 16:16 odesílatel David G. Johnston
<david.g.johnston@gmail.com> napsal:

On Fri, Dec 15, 2023 at 8:05 AM Josef Šimánek <josef.simanek@gmail.com> wrote:

pá 15. 12. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

What do you think?

I like the idea. But what will happen if the SQL statement references
tables or other objects, since we have no database?

This seems like a fairly useless wart to me. What does it do that
you can't do better with existing facilities (psql etc)?

Per my experience, this is mostly useful during development to catch
syntax errors super early.

I'd suggest helping this project get production capable since it already is trying to integrate with the development ecosystem you describe here.

https://github.com/supabase/postgres_lsp

Indeed LSP is the one of the targets of this feature. Currently it is
using https://github.com/pganalyze/libpg_query under the hood probably
because of the same reasons I have described (parser is not available
in public APIs of postgres_fe.h or libpq). Exposing basic parser
capabilities in postgres binary itself and also on SQL level by
pg_check_syntax function can prevent the need of "hacking" pg parser
to be accessible outside of server binary.

Show quoted text

I agree that developing this as a new executable for the purpose is needed in order to best conform to existing conventions.

David J.

#9David G. Johnston
david.g.johnston@gmail.com
In reply to: Josef Šimánek (#8)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek <josef.simanek@gmail.com>
wrote:

(parser is not available
in public APIs of postgres_fe.h or libpq).

What about building "libpg" that does expose and exports some public APIs
for the parser? We can include a reference CLI implementation for basic
usage of the functionality while leaving the actual language server project
outside of core.

David J.

#10Josef Šimánek
josef.simanek@gmail.com
In reply to: David G. Johnston (#9)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

pá 15. 12. 2023 v 16:32 odesílatel David G. Johnston
<david.g.johnston@gmail.com> napsal:

On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek <josef.simanek@gmail.com> wrote:

(parser is not available
in public APIs of postgres_fe.h or libpq).

What about building "libpg" that does expose and exports some public APIs for the parser? We can include a reference CLI implementation for basic usage of the functionality while leaving the actual language server project outside of core.

Language server (LSP) can just benefit from this feature, but it
doesn't cover all possibilities since LSP is meant for one purpose ->
run in developer's code editor. Actual syntax check is more generic,
able to cover CI checks and more. I would not couple this feature and
LSP, LSP can just benefit from it (and it is usually built in a way
that uses other tools and packs them into LSP). Exposing this kind of
SQL check doesn't mean something LSP related being implemented in
core. LSP can just benefit from this.

Exposing parser to libpg seems good idea, but I'm not sure how simple
that could be done since during my journey I have found out there are
a lot of dependencies which are not present in usual frontend code per
my understanding like memory contexts and removal of those
(decoupling) would be huge project IMHO.

Show quoted text

David J.

#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Josef Šimánek (#10)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

On 12/15/23 16:38, Josef Šimánek wrote:

pá 15. 12. 2023 v 16:32 odesílatel David G. Johnston
<david.g.johnston@gmail.com> napsal:

On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek <josef.simanek@gmail.com> wrote:

(parser is not available
in public APIs of postgres_fe.h or libpq).

What about building "libpg" that does expose and exports some public APIs for the parser? We can include a reference CLI implementation for basic usage of the functionality while leaving the actual language server project outside of core.

Language server (LSP) can just benefit from this feature, but it
doesn't cover all possibilities since LSP is meant for one purpose ->
run in developer's code editor. Actual syntax check is more generic,
able to cover CI checks and more. I would not couple this feature and
LSP, LSP can just benefit from it (and it is usually built in a way
that uses other tools and packs them into LSP). Exposing this kind of
SQL check doesn't mean something LSP related being implemented in
core. LSP can just benefit from this.

I don't know enough about LSP to have a good idea how to implement this
for PG, but my assumption would be we'd have some sort of library
exposing "parser" to frontend tools, and also an in-core binary using
that library (say, src/bin/pg_syntax_check). And LSP would use that
parser library too ...

I think there's about 0% chance we'd add this to "postgres" binary.

Exposing parser to libpg seems good idea, but I'm not sure how simple
that could be done since during my journey I have found out there are
a lot of dependencies which are not present in usual frontend code per
my understanding like memory contexts and removal of those
(decoupling) would be huge project IMHO.

You're right the grammar/parser expects a lot of backend infrastructure,
so making it available to frontend is going to be challenging. But I
doubt there's a better way than starting with gram.y and either removing
or adding the missing pieces (maybe only a mock version of it).

I'm not a bison expert, but considering your goal seems to be a basic
syntax check, maybe you could simply rip out most of the bits depending
on backend stuff, or maybe replace them with some trivial no-op code?

But as Tom mentioned, the question is how far gram.y gets you. There's
plenty of ereport(ERROR) calls in src/backend/parser/*.c our users might
easily consider as parse errors ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Josef Šimánek
josef.simanek@gmail.com
In reply to: Tomas Vondra (#11)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

ne 25. 2. 2024 v 23:24 odesílatel Tomas Vondra
<tomas.vondra@enterprisedb.com> napsal:

On 12/15/23 16:38, Josef Šimánek wrote:

pá 15. 12. 2023 v 16:32 odesílatel David G. Johnston
<david.g.johnston@gmail.com> napsal:

On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek <josef.simanek@gmail.com> wrote:

(parser is not available
in public APIs of postgres_fe.h or libpq).

What about building "libpg" that does expose and exports some public APIs for the parser? We can include a reference CLI implementation for basic usage of the functionality while leaving the actual language server project outside of core.

Language server (LSP) can just benefit from this feature, but it
doesn't cover all possibilities since LSP is meant for one purpose ->
run in developer's code editor. Actual syntax check is more generic,
able to cover CI checks and more. I would not couple this feature and
LSP, LSP can just benefit from it (and it is usually built in a way
that uses other tools and packs them into LSP). Exposing this kind of
SQL check doesn't mean something LSP related being implemented in
core. LSP can just benefit from this.

I don't know enough about LSP to have a good idea how to implement this
for PG, but my assumption would be we'd have some sort of library
exposing "parser" to frontend tools, and also an in-core binary using
that library (say, src/bin/pg_syntax_check). And LSP would use that
parser library too ...

I think there's about 0% chance we'd add this to "postgres" binary.

Exposing parser to frontend tools makes no sense to me and even if it
would, it is a huge project not worth to be done just because of
syntax check. I can update the patch to prepare a new binary, but
still on the backend side. This syntax check should be equivalent to
running a server locally, running a query and caring only about
parsing part finished successfully. In my thinking, this belongs to
backend tools.

Show quoted text

Exposing parser to libpg seems good idea, but I'm not sure how simple
that could be done since during my journey I have found out there are
a lot of dependencies which are not present in usual frontend code per
my understanding like memory contexts and removal of those
(decoupling) would be huge project IMHO.

You're right the grammar/parser expects a lot of backend infrastructure,
so making it available to frontend is going to be challenging. But I
doubt there's a better way than starting with gram.y and either removing
or adding the missing pieces (maybe only a mock version of it).

I'm not a bison expert, but considering your goal seems to be a basic
syntax check, maybe you could simply rip out most of the bits depending
on backend stuff, or maybe replace them with some trivial no-op code?

But as Tom mentioned, the question is how far gram.y gets you. There's
plenty of ereport(ERROR) calls in src/backend/parser/*.c our users might
easily consider as parse errors ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Josef Šimánek (#12)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

On Sun, 25 Feb 2024 at 23:34, Josef Šimánek <josef.simanek@gmail.com> wrote:

Exposing parser to frontend tools makes no sense to me

Not everyone seems to agree with that, it's actually already done by
Lukas from pganalyze: https://github.com/pganalyze/libpg_query

#14Josef Šimánek
josef.simanek@gmail.com
In reply to: Jelte Fennema-Nio (#13)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

po 26. 2. 2024 v 8:20 odesílatel Jelte Fennema-Nio <postgres@jeltef.nl> napsal:

On Sun, 25 Feb 2024 at 23:34, Josef Šimánek <josef.simanek@gmail.com> wrote:

Exposing parser to frontend tools makes no sense to me

Not everyone seems to agree with that, it's actually already done by
Lukas from pganalyze: https://github.com/pganalyze/libpg_query

I did a quick look. That's clearly amazing work, but it is not parser
being exposed to frontend (refactored). It is (per my understanding)
backend code isolated to minimum to be able to parse query. It is
still bound to individual backend version and to backend source code.
And it is close to my effort (I was about to start with a simpler
version not providing tokens, just the result), but instead of copying
files from backend into separate project and shave off to minimum,
provide the same tool with backend utils directly.

#15Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#11)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

On Sun, Feb 25, 2024 at 5:24 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I think there's about 0% chance we'd add this to "postgres" binary.

Several people have taken this position, so I'm going to mark
https://commitfest.postgresql.org/48/4704/ as Rejected.

My own opinion is that syntax checking is a useful thing to expose,
but I don't believe that this is a useful way to expose it. I think
this comment that Tom made upthread is right on target:

# Another thing I don't like is that this exposes to the user what ought
# to be purely an implementation detail, namely the division of labor
# between gram.y (raw_parser()) and the rest of the parser. There are
# checks that a user would probably see as "syntax checks" that don't
# happen in gram.y, and conversely there are some things we reject there
# that seem more like semantic than syntax issues.

I think that what that means in practice is that, while this patch may
seem to give reasonable results in simple tests, as soon as you try to
do slightly more complicated things with it, it's going to give weird
results, either failing to flag things that you'd expect it to flag,
or flagging things you'd expect it not to flag. Fixing that would be
either impossible or a huge amount of work depending on your point of
view. If you take the point of view that we need to adjust things so
that the raw parser reports all the things that ought to be reported
by a tool like this and none of the things that it shouldn't, then
it's probably just a huge amount of work. If you take the point of
view that what goes into the raw parser and what goes into parse
analysis ought to be an implementation decision untethered to what a
tool like this ought to report, then fixing the problems would be
impossible, or at least, it would amount to throwing away this patch
and starting over. I think the latter point of view, which Tom has
already taken, would be the more prevalent view among hackers by far,
but even if the former view prevailed, who is going to do all that
work?

I strongly suspect that doing something useful in this area requires
about two orders of magnitude more code than are included in this
patch, and a completely different design. If it actually worked well
to do something this simple, somebody probably would have done it
already. In fact, there are already tools out there for validation,
like https://github.com/okbob/plpgsql_check for example. That tool
doesn't do exactly the same thing as this patch is trying to do, but
it does do other kinds of validation, and it's 19k lines of code, vs.
the 45 lines of code in this patch, which I think reinforces the point
that you need to do something much more complicated than this to
create real value.

Also, the patch as proposed, besides being 45 lines, also has zero
lines of comments, no test cases, no documentation, and doesn't follow
the PostgreSQL coding standards. I'm not saying that to be mean, nor
am I suggesting that Josef should go fix that stuff. It's perfectly
reasonable to propose a small patch without a lot of research to see
what people think -- but now we have the answer to that question:
people think it won't work. So Josef should now decide to either give
up, or try a new approach, or if he's really sure that all the
feedback that has been given so far is completely wrong, he can try to
demonstrate that the patch does all kinds of wonderful things with
very few disadvantages. But I think if he does that last, he's going
to find that it's not really possible, because I'm pretty sure that
Tom is right.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Josef Šimánek
josef.simanek@gmail.com
In reply to: Robert Haas (#15)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

st 15. 5. 2024 v 19:43 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:

On Sun, Feb 25, 2024 at 5:24 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I think there's about 0% chance we'd add this to "postgres" binary.

Several people have taken this position, so I'm going to mark
https://commitfest.postgresql.org/48/4704/ as Rejected.

My own opinion is that syntax checking is a useful thing to expose,
but I don't believe that this is a useful way to expose it. I think
this comment that Tom made upthread is right on target:

# Another thing I don't like is that this exposes to the user what ought
# to be purely an implementation detail, namely the division of labor
# between gram.y (raw_parser()) and the rest of the parser. There are
# checks that a user would probably see as "syntax checks" that don't
# happen in gram.y, and conversely there are some things we reject there
# that seem more like semantic than syntax issues.

I think that what that means in practice is that, while this patch may
seem to give reasonable results in simple tests, as soon as you try to
do slightly more complicated things with it, it's going to give weird
results, either failing to flag things that you'd expect it to flag,
or flagging things you'd expect it not to flag. Fixing that would be
either impossible or a huge amount of work depending on your point of
view. If you take the point of view that we need to adjust things so
that the raw parser reports all the things that ought to be reported
by a tool like this and none of the things that it shouldn't, then
it's probably just a huge amount of work. If you take the point of
view that what goes into the raw parser and what goes into parse
analysis ought to be an implementation decision untethered to what a
tool like this ought to report, then fixing the problems would be
impossible, or at least, it would amount to throwing away this patch
and starting over. I think the latter point of view, which Tom has
already taken, would be the more prevalent view among hackers by far,
but even if the former view prevailed, who is going to do all that
work?

I strongly suspect that doing something useful in this area requires
about two orders of magnitude more code than are included in this
patch, and a completely different design. If it actually worked well
to do something this simple, somebody probably would have done it
already. In fact, there are already tools out there for validation,
like https://github.com/okbob/plpgsql_check for example. That tool
doesn't do exactly the same thing as this patch is trying to do, but
it does do other kinds of validation, and it's 19k lines of code, vs.
the 45 lines of code in this patch, which I think reinforces the point
that you need to do something much more complicated than this to
create real value.

Also, the patch as proposed, besides being 45 lines, also has zero
lines of comments, no test cases, no documentation, and doesn't follow
the PostgreSQL coding standards. I'm not saying that to be mean, nor
am I suggesting that Josef should go fix that stuff. It's perfectly
reasonable to propose a small patch without a lot of research to see
what people think -- but now we have the answer to that question:
people think it won't work. So Josef should now decide to either give
up, or try a new approach, or if he's really sure that all the
feedback that has been given so far is completely wrong, he can try to
demonstrate that the patch does all kinds of wonderful things with
very few disadvantages. But I think if he does that last, he's going
to find that it's not really possible, because I'm pretty sure that
Tom is right.

I'm totally OK to mark this as rejected and indeed 45 lines were just
minimal patch to create PoC (I have coded this during last PGConf.eu
lunch break) and mainly to start discussion.

I'm not sure everyone in this thread understands the reason for this
patch, which is clearly my fault, since I have failed to explain. Main
idea is to make a tool to validate query can be parsed, that's all.
Similar to running "EXPLAIN query", but not caring about the result
and not caring about the DB structure (ignoring missing tables, ...),
just checking it was successfully executed. This definitely belongs to
the server side and not to the client side, it is just a tool to
validate that for this running PostgreSQL backend it is a "parseable"
query.

I'm not giving up on this, but I hear there are various problems to
explore. If I understand it well, just running the parser to query
doesn't guarantee the query is valid, since it can fail later for
various reasons (I mean other than missing table, ...). I wasn't aware
of that. Also exposing this inside postgres binary seems
controversial. I had an idea to expose parser result at SQL level with
a new command (similar to EXPLAIN), but you'll need running PostgreSQL
backend to be able to use this capability, which is against one of the
original ideas. On the otherside PostgreSQL exposes a lot of "meta"
functionality already and this could be a nice addition.

I'll revisit the discussion again and try to submit another try once
I'll get more context and experience.

Thanks everyone for constructive comments!

Show quoted text

--
Robert Haas
EDB: http://www.enterprisedb.com

#17Josef Šimánek
josef.simanek@gmail.com
In reply to: Tom Lane (#5)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

pá 15. 12. 2023 v 15:50 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

What do you think?

I like the idea. But what will happen if the SQL statement references
tables or other objects, since we have no database?

This seems like a fairly useless wart to me.

this hurts :'(

Show quoted text

In the big picture a command line switch in the postgres executable
doesn't feel like the right place for this. There's no good reason
to assume that the server executable will be installed where you want
this capability; not to mention the possibility of version skew
between that executable and whatever installation you're actually
running on.

Another thing I don't like is that this exposes to the user what ought
to be purely an implementation detail, namely the division of labor
between gram.y (raw_parser()) and the rest of the parser. There are
checks that a user would probably see as "syntax checks" that don't
happen in gram.y, and conversely there are some things we reject there
that seem more like semantic than syntax issues.

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josef Šimánek (#16)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?= <josef.simanek@gmail.com> writes:

I'm not sure everyone in this thread understands the reason for this
patch, which is clearly my fault, since I have failed to explain. Main
idea is to make a tool to validate query can be parsed, that's all.
Similar to running "EXPLAIN query", but not caring about the result
and not caring about the DB structure (ignoring missing tables, ...),
just checking it was successfully executed. This definitely belongs to
the server side and not to the client side, it is just a tool to
validate that for this running PostgreSQL backend it is a "parseable"
query.

The thing that was bothering me most about this is that I don't
understand why that's a useful check. If I meant to type

UPDATE mytab SET mycol = 42;

and instead I type

UPDATEE mytab SET mycol = 42;

your proposed feature would catch that; great. But if I type

UPDATE mytabb SET mycol = 42;

it won't. How does that make sense? I'm not entirely sure where
to draw the line about what a "syntax check" should catch, but this
seems a bit south of what I'd want in a syntax-checking editor.

BTW, if you do feel that a pure grammar check is worthwhile, you
should look at the ecpg preprocessor, which does more or less that
with the SQL portions of its input. ecpg could be a better model
to follow because it doesn't bring all the dependencies of the server
and so is much more likely to appear in a client-side installation.
It's kind of an ugly, unmaintained mess at the moment, sadly.

The big knock on doing this client-side is that there might be
version skew compared to the server you're using --- but if you
are not doing anything beyond a grammar-level check then your
results are pretty approximate anyway, ISTM. We've not heard
anything suggesting that version skew is a huge problem for
ecpg users.

regards, tom lane

#19Wolfgang Walther
walther@technowledgy.de
In reply to: Tom Lane (#18)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

Tom Lane:

The thing that was bothering me most about this is that I don't
understand why that's a useful check. If I meant to type

UPDATE mytab SET mycol = 42;

and instead I type

UPDATEE mytab SET mycol = 42;

your proposed feature would catch that; great. But if I type

UPDATE mytabb SET mycol = 42;

it won't. How does that make sense? I'm not entirely sure where
to draw the line about what a "syntax check" should catch, but this
seems a bit south of what I'd want in a syntax-checking editor.

BTW, if you do feel that a pure grammar check is worthwhile, you
should look at the ecpg preprocessor, which does more or less that
with the SQL portions of its input. ecpg could be a better model
to follow because it doesn't bring all the dependencies of the server
and so is much more likely to appear in a client-side installation.
It's kind of an ugly, unmaintained mess at the moment, sadly.

Would working with ecpg allow to get back a parse tree of the query to
do stuff with that?

This is really what is missing for the ecosystem. A libpqparser for
tools to use: Formatters, linters, query rewriters, simple syntax
checkers... they are all missing access to postgres' own parser.

Best,

Wolfgang

#20Josef Šimánek
josef.simanek@gmail.com
In reply to: Tom Lane (#18)
Re: [PATCH] Add --syntax to postgres for SQL syntax checking

st 15. 5. 2024 v 20:39 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?= <josef.simanek@gmail.com> writes:

I'm not sure everyone in this thread understands the reason for this
patch, which is clearly my fault, since I have failed to explain. Main
idea is to make a tool to validate query can be parsed, that's all.
Similar to running "EXPLAIN query", but not caring about the result
and not caring about the DB structure (ignoring missing tables, ...),
just checking it was successfully executed. This definitely belongs to
the server side and not to the client side, it is just a tool to
validate that for this running PostgreSQL backend it is a "parseable"
query.

The thing that was bothering me most about this is that I don't
understand why that's a useful check. If I meant to type

UPDATE mytab SET mycol = 42;

and instead I type

UPDATEE mytab SET mycol = 42;

your proposed feature would catch that; great. But if I type

UPDATE mytabb SET mycol = 42;

it won't. How does that make sense? I'm not entirely sure where
to draw the line about what a "syntax check" should catch, but this
seems a bit south of what I'd want in a syntax-checking editor.

This is exactly where the line is drawn. My motivation is not to use
this feature for syntax check in editor (even could be used to find
those banalities). I'm looking to improve automation to be able to
detect those banalities as early as possible. Let's say there is
complex CI automation configuring and starting PostgreSQL backend,
loading some data, ... and in the end all this is useless, since there
is this kind of simple mistake like UPDATEE. I would like to detect
this problem as early as possible and stop the complex CI pipeline to
save time and also to save resources (= money) by failing super-early
and reporting back. This kind of mistake could be simply introduced by
like wrongly resolved git conflict, human typing error ...

This kind of mistake (typo, ...) can be usually spotted super early.
In compiled languages during compilation, in interpreted languages
(like Ruby) at program start (since code is parsed as one of the first
steps). There is no such early detection possible for PostgreSQL
currently IMHO.

BTW, if you do feel that a pure grammar check is worthwhile, you
should look at the ecpg preprocessor, which does more or less that
with the SQL portions of its input. ecpg could be a better model
to follow because it doesn't bring all the dependencies of the server
and so is much more likely to appear in a client-side installation.
It's kind of an ugly, unmaintained mess at the moment, sadly.

The big knock on doing this client-side is that there might be
version skew compared to the server you're using --- but if you
are not doing anything beyond a grammar-level check then your
results are pretty approximate anyway, ISTM. We've not heard
anything suggesting that version skew is a huge problem for
ecpg users.

Thanks for the info, I'll check.

Show quoted text

regards, tom lane

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wolfgang Walther (#19)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#18)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Josef Šimánek (#16)
#24Wolfgang Walther
walther@technowledgy.de
In reply to: Tom Lane (#21)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#22)
#26David G. Johnston
david.g.johnston@gmail.com
In reply to: Wolfgang Walther (#24)
#27Josef Šimánek
josef.simanek@gmail.com
In reply to: Tom Lane (#25)
#28Josef Šimánek
josef.simanek@gmail.com
In reply to: David G. Johnston (#26)
#29David G. Johnston
david.g.johnston@gmail.com
In reply to: Josef Šimánek (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#26)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#25)
#32David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#31)
#33David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#32)
#34Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#18)
#35Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#23)